pa1_0_7_user_en

8/9/2019 pa1_0_7_user_en

http://slidepdf.com/reader/full/pa107useren 1/86

SAP Predictive Analysis User Guide

■ SAP Predictive Analysis 1.0.7

2012-11-19

8/9/2019 pa1_0_7_user_en


© 2012 SAP AG. All rights reserved.SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAPBusinessObjects Explorer, StreamWork, SAP HANA and other SAP products and services mentioned

Copyright

herein as well as their respective logos are trademarks or registered trademarks of SAP AG in

Germany and other countries.Business Objects and the Business Objects logo, BusinessObjects,Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects productsand services mentioned herein as well as their respective logos are trademarks or registeredtrademarks of Business Objects Software Ltd. Business Objects is an SAP company.Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and servicesmentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase,Inc. Sybase is an SAP company. Crossgate, m@gic EDDY, B2B 360°, B2B 360° Services areregistered trademarks of Crossgate AG in Germany and other countries. Crossgate is an SAPcompany. All other product and service names mentioned are the trademarks of their respectivecompanies. Data contained in this document serves informational purposes only. National productspecifications may vary.These materials are subject to change without notice. These materials areprovided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services

are those that are set forth in the express warranty statements accompanying such products andservices, if any. Nothing herein should be construed as constituting an additional warranty.

2012-11-19

8/9/2019 pa1_0_7_user_en


Contents

About this Guide.....................................................................................................................7Chapter 1

What this Guide Contains........................................................................................................71.1

Target Audience.......................................................................................................................71.2

Release Restrictions...............................................................................................................9Chapter 2

SAP Predictive Analysis Overview........................................................................................11Chapter 3

Installing SAP Predictive Analysis........................................................................................13Chapter 4

Installation prerequisites.........................................................................................................134.1

To install SAP Predictive Analysis using the setup program....................................................134.2

To uninstall SAP Predictive Analysis ......................................................................................144.3

Important considerations for using SAP HANA.......................................................................144.4

To configure _SYS_REPO for the SAP Predictive Analysis user.............................................154.4.1

Supported OLAP measures ..................................................................................................154.4.2

Important considerations for using SAP BusinessObjects Universes......................................154.5

Open-Source R Installation and Configuration.....................................................................17Chapter 5

Installing R-2.15.1 and the Required Packages.......................................................................175.1

Configuring R.........................................................................................................................175.2

Getting Started with SAP Predictive Analysis.......................................................................19Chapter 6

Basics of SAP Predictive Analysis..........................................................................................196.1

Launching SAP Predictive Analysis........................................................................................206.2

Understanding SAP Predictive Analysis.................................................................................206.3

Designer View.......................................................................................................................216.3.1

Results View..........................................................................................................................216.3.2

Using SAP Predictive Analysis from Start to Finish................................................................226.4

Building Analyses..................................................................................................................25Chapter 7

Creating an Analysis..............................................................................................................257.1

2012-11-193

8/9/2019 pa1_0_7_user_en


Acquiring Data from a Data Source........................................................................................257.1.1

Preparing Data for Analysis....................................................................................................267.1.2

Applying Algorithms...............................................................................................................277.1.3

Storing Results of the Analysis..............................................................................................297.1.4

Running the Analysis..............................................................................................................297.2

Saving the Analysis................................................................................................................307.3

Viewing Results.....................................................................................................................307.4

Analyzing Data......................................................................................................................31Chapter 8

Visualization Charts...............................................................................................................318.1

Scatter Matrix Chart..............................................................................................................318.1.1

Statistical Summary Chart......................................................................................................328.1.2

Parallel Coordinates...............................................................................................................328.1.3

Decision Tree.........................................................................................................................338.1.4Regression Chart...................................................................................................................348.1.5

Time Series Chart..................................................................................................................358.1.6

Cluster Chart.........................................................................................................................368.1.7

Working with Models............................................................................................................37Chapter 9

Creating a Model...................................................................................................................379.1

Viewing Model Information.....................................................................................................379.2

Exporting a Model as PMML..................................................................................................389.3

Deleting a Model....................................................................................................................389.4

Use Case Scenarios..............................................................................................................39Chapter 10

Sales Forecasting..................................................................................................................3910.1

Retail Store Segmentation.....................................................................................................4010.2

Component Properties..........................................................................................................43Chapter 11

Algorithms..............................................................................................................................4311.1

Regression.............................................................................................................................4311.1.1

Outliers..................................................................................................................................5211.1.2

Time Series............................................................................................................................5411.1.3

Decision Trees.......................................................................................................................5811.1.4

Neural Network......................................................................................................................6111.1.5

Clustering..............................................................................................................................6411.1.6

Association............................................................................................................................6611.1.7

Classification..........................................................................................................................6811.1.8

Data Preparation Components...............................................................................................6911.2

Formula..................................................................................................................................6911.2.1

2012-11-194

Contents

8/9/2019 pa1_0_7_user_en


Sample...................................................................................................................................7411.2.2

Data Type Definition...............................................................................................................7711.2.3

Filter.......................................................................................................................................7711.2.4

Data Writers..........................................................................................................................8211.3

CSV Writer............................................................................................................................8311.3.1

JDBC Writer..........................................................................................................................8311.3.2

HANA Writer..........................................................................................................................8411.3.3

Saved Models........................................................................................................................8411.4

More Information...................................................................................................................85 Appendix A

2012-11-195

Contents

8/9/2019 pa1_0_7_user_en


2012-11-196

Contents

8/9/2019 pa1_0_7_user_en


About this Guide

1.1 What this Guide Contains

This guide provides:

• An overview of SAP Predictive Analysis

• Information on how to install and configure SAP Predictive Analysis

• Information on various algorithms and components available in SAP Predictive Analysis

• Information on how to create analyses and models

• Information on how to analyze data using predictive analysis visualization techniques

This guide does not cover:

• How to acquire data from various data sources

• How to perform data manipulation, data cleansing, and semantic enrichments operations in thePrepare panel

• How to share charts and datasets

Note:SAP Predictive Analysis inherits data acquisition and data manipulation functionality from SAP VisualIntelligence. Therefore, for information about workflows not covered in this guide, see the SAP Visual

Intelligence User Guide available at http://help.sap.com/vi. We recommend that you read the SAP Visual

Intelligence User Guide in combination with the SAP Predictive Analysis User Guide to understand thecomplete workflow for analyzing data using predictive analysis algorithms.

1.2 Target Audience

This guide is intended for professional data analysts, business analysts, and information designers whowant to use the SAP Predictive Analysis application to analyze and visualize data using predictivealgorithms.

Note:To use the SAP Predictive Analysis application, you need to be familiar with statistical and data miningalgorithms and have a basic understanding on how to use these algorithms.

2012-11-197

About this Guide

http://help.sap.com/vi


8/9/2019 pa1_0_7_user_en


2012-11-198

About this Guide

8/9/2019 pa1_0_7_user_en


Release Restrictions

The following are the known issues and limitations in this release:

• The application might crash when viewing charts with large input data.

To work around this issue, you need to either remove or modify the -Xmx parameter in the SAPPre

dictiveAnalysis.ini file, depending on your system configuration.

While working with the application, if the memory consumed by the application is not released, youmay experience a delay in opening the document. To work around this issue, relaunch the application.

• An error occurs when you try to save the document (.SViD file) from the Predict view with savedvisualizations created in the Prepare view. You may also encounter similar error when you try tosave the document (.SViD file) after using the Enrich All option in the Visualize pane of the Predictview.

To work around this issue, save the document by switching to the Prepare view.

• To enable R algorithms from within the SAP Predictive Analysis application, you need to have accessrights to update files in the SAP Predictive Analysis install location. If you do not have rights, youneed to contact IT administrator to obtain rights.

• You need to use the INSTRING function in a formula of the filter component in the following format:INSTRING(‘String’,’String’) == 'true’ / 'false'

• You can configure certain components in an analysis even though they are not connected to thereader component. However, when you try to run the analysis, an error occurs.

• The application hangs if you try to run an analysis in which the names of the selected columns inthe acquired data set contain special characters, such as ~!@#$%^&*()_+`-={}|[]\:";'<>?,./.

To work around this issue, rename the column before navigating to the Predict view.

• The application cannot render a decision tree if there are more than 32 distinct categorical valuesfor a dependent column.

• After acquiring data from the HANA Online data source, if you apply a filter in the Prepare view,create and execute an analysis in the Predict view, view the analysis results, and then try to navigateback to the Prepare view, applied filter is not retained in the Prepare view.

• While installing R, if you close the SAP Predictive Analysis application, the R installation is not

immediately stopped. To end the installation, you need to kill the corresponding powershell.exeusing Microsoft Windows Task Manager.

• If your existing R packages are corrupted, you cannot use the Install and Configure R option toinstall R packages. To use the Install and Configure R option from the application, you need tomanually remove the corrupted R packages.

• Data size limits for visualizations:

2012-11-199


8/9/2019 pa1_0_7_user_en


Number of Rows SupportedChart

240 rowsScatter Matrix Chart

2500 rowsNote:The application takes some time to render thechart if the input data is more than 1000 rows.

Parallel Coordinates

3000 rows

Note:The application takes some time to render thechart if the input data is more than 1000 rows.

Time Series Chart

3000 rows

Note:The application takes some time to render thechart if the input data is more than 1000 rows.

Regression Chart

• When viewing the scatter matrix chart with large data set, the application displays the message"Loading, please wait", and the chart is not displayed.

To work around this issue, reduce the input data size, run the analysis, and view the chart.

2012-11-1910


8/9/2019 pa1_0_7_user_en


SAP Predictive Analysis Overview

SAP Predictive Analysis is a statistical analysis and data mining solution that enables you to buildpredictive models to discover hidden insights and relationships in your data, from which you can makepredictions about future events.

With SAP Predictive Analysis, you can perform various analyses on the data, including time seriesforecasting, outlier detection, trend analysis, classification analysis, segmentation analysis, and affinityanalysis. This application enables you to analyze data using different visualization techniques, such asscatter matrix charts, parallel coordinates, cluster charts, and decision trees.

SAP Predictive Analysis offers a range of predictive analysis algorithms, supports the use of the Ropen-source statistical analysis language, and offers in-memory data mining capabilities for handlinglarge volume data analysis efficiently.

Note:SAP Predictive Analysis inherits data acquisition and data manipulation functionality from SAP VisualIntelligence. SAP Visual Intelligence is a data manipulation and visualization tool. Using SAP VisualIntelligence, you can connect to various data sources such as flat files, relational databases, in-memorydatabases, and SAP BusinessObjects universes, and can operate on different volumes of data, froma small matrix of data in a CSV file to a very large dataset in SAP HANA, select and clean data, andmanipulate data.

2012-11-1911


8/9/2019 pa1_0_7_user_en


2012-11-1912


8/9/2019 pa1_0_7_user_en


Installing SAP Predictive Analysis

4.1 Installation prerequisites

Before installing SAP Predictive Analysis, make sure the following requirements are met:

• You must have Microsoft Windows 7 operating system installed on your machine. SAP Predictive Analysis is supported on both 32-bit and 64-bit machines.

• If you have already installed SAP Visual Intelligence on your machine, you need to uninstall it beforeinstalling SAP Predictive Analysis.

• You must have administrator rights to install SAP Predictive Analysis on the computer.

• Sufficient disk space must be available on the following resources:

Required SpaceResource

2.5 GBDrive hosting the User application data folder

200 MBUser temporary folder (\AppData\Local\Temp)

500 MBDrive hosting the installation directory

• The following ports must be available:

Required by Port

Sybase IQ database6401

SAP Predictive Analysis installation Any port in the range 4520-4539

For a detailed list of supported environments and hardware requirements, see the Product AvailibilityMatrix at: http://service.sap.com/pam

4.2 To install SAP Predictive Analysis using the setup program

1. Run the setup.exe file.

The "User Account Control" dialog box appears with a warning message.

2. Choose Yes in the confirmation prompt.

2012-11-1913


http://service.sap.com/pam

http://service.sap.com/pam

8/9/2019 pa1_0_7_user_en


3. Specify the destination folder for installing SAP Predictive Analysis.

• To accept the default installation directory, choose Next .

• To navigate to the folder where you want to install SAP Predictive Analysis, choose Browse.Select the required folder and choose Next.

The "License Agreement" page opens.

4. Review the license agreement and select I accept the License Agreement and choose Next.

5. To begin the installation, choose Next.

The installation is complete when the "Finish Installation" page opens.

6. To exit this installation, choose Finish.

4.3 To uninstall SAP Predictive Analysis

1. Choose Start > Control Panel > Programs.

2. Choose Uninstall a program.

3. Right-click SAP Predictive Analysis and choose Uninstall.

The SAP Predictive Analysis Setup wizard appears.

4. On the Confirm Uninstall page, choose Next .

5. To complete the uninstallation, choose Finish .

4.4 Important considerations for using SAP HANA

This section contains important considerations and requirements for using SAP Predictive Analysiswith the SAP HANA database.

Security requirements for publishing to SAP HANA

Before users can publish content to SAP HANA, they must be assigned specific privileges and roles.These roles and privileges are also required for retrieving data from SAP HANA. Use the SAP HANAStudio application to assign user roles and privileges. For information on administrating the SAP HANAdatabase and using SAP HANA Studio see SAP HANA Database – Administration Guide . For informationon user security see the SAP HANA Security Guide (Including SAP HANA Database Security) .

The user account used to log into the SAP HANA system from SAP Predictive Analysis must be assignedthe "MODELING" role (in SAP HANA).

Note:This action can only be performed by a user with ROLE_ADMIN privileges on the SAP HANA database.

When an SAP Predictive Analysis user logs into the SAP HANA system, the internal _SYS_REPO

account must:

2012-11-1914


8/9/2019 pa1_0_7_user_en


• Be granted the SELECT SQL Privileges.

• Have the Grantable to othersoption selected in the (SAP Predictive Analysis) user's schema.

4.4.1 To configure _SYS_REPO for the SAP Predictive Analysis user

If an account for the SAP Predictive Analysis user is already defined in the SAP HANA system:

1. From the system connection in the SAP HANA Studio Navigator window, choose Catalog >

Authorization > Users.

2. Double-click the _SYS_REPO account.

3. On the SQL Privileges tab, click the + icon, and enter the name of the user's schema, choose OK.

4. Choose SELECT and the corresponding Yes under Grantable to others.

5. Choose Deploy or Save.

Note:Users can also open an SQL editor in SAP HANA Studio and run the following SQL statement:

GRANT SELECT ON SCHEMA <user_account_name> TO _SYS_REPO WITH GRANT OPTION

4.4.2 Supported OLAP measures

SAP HANA supports only the following measures of aggregation in OLAP data sources

• SUM

• MIN

• MAX

• COUNT

If your dataset contains an aggregation on a measure that is not listed above, the aggregation will beignored by SAP HANA during publication and it will not be part of the final published artifact.

4.5 Important considerations for using SAP BusinessObjects Universes

• To acquire data from universes that exist on the BI 4.0 platform, ensure that the Web IntelligenceServer running.

• You also need to ensure that your Business Intelligence platform is at BI 4.0 SP2 patch level 14 or above.

2012-11-1915


8/9/2019 pa1_0_7_user_en


Note:You can also acquire data from universes that exist on BI 4.0 SP3 and BI 4.0 SP4 platforms.

2012-11-1916


8/9/2019 pa1_0_7_user_en


Open-Source R Installation and Configuration

5.1 Installing R-2.15.1 and the Required Packages

To use open-source R algorithms in your analysis, you need to install the R environment and configureit with the SAP Predictive Analysis application.

SAP Predictive Analysis provides an option to install and configure R 2.15.1 and the required packagesfrom within the application. Ensure that you are connected to the internet while installing R.

To install the R environment and the required packages, perform the following steps:

1. Launch the SAP predictive analysis application.

2. From the File menu, choose Install and Configure R.

3. Select Install R.

4. Read the open-source R license agreement, important instructions, and select I agree to install R

using the script.

5. Select OK.

Note:

If you have already installed R 2.15.x, you can use this procedure to install the required R packages.

5.2 Configuring R

After you have installed R, you need to configure the R environment to enable R algorithms in theapplication. If you have already installed R-2.11.1 or R-2.15.1 and the required packages, you can skipthe R installation step and directly configure R.

Note:

Before configuring R-2.11.1, you need to set certain environment variables. For example, if you haveinstalled R at C:\Program Files\R\R-2.11.1, then you need to set the environment variables asfollows:

• R_HOME= C:\Program Files\R\R-2.11.1

• R_LIBS = C:\Program Files\R\R-2.11.1\library

• Path = existing path; C:\Program Files\R\R-2.11.1\library\rJava\jri;

C:\Program Files\R\R-2.11.1\bin

2012-11-1917


8/9/2019 pa1_0_7_user_en


To configure R, perform the following steps:


2. From the File menu, choose Install and Configure R.

3. On the Configuration tab, select Enable Open Source R Algorithms.4. Choose Browse to select the R installation folder.

5. Choose OK.

The "User Account Control" dialog box appears with a warning message.

6. Choose Yes in the confirmation prompt.

2012-11-1918


8/9/2019 pa1_0_7_user_en


Getting Started with SAP Predictive Analysis

6.1 Basics of SAP Predictive Analysis

Component

A component is the basic processing unit of SAP Predictive Analysis. Each component contains inputand/or output anchors (connection points). These anchors are used to connect components through

connectors. When you connect components together, data is transmitted from predecessor componentsto their successor components.

SAP Predictive Analysis consists of the following components:

• Data preparation

• Algorithms

• Data writers

You can access components from the Designer view of the Predict panel. After you have addedcomponents to the analysis editor, the status icon of a component allows you to identify its state.

The following are the states of a component:

• (Not Configured): This state is displayed when you drag a component onto the analysis editor.It indicates that the component needs to be configured before running the analysis.

• (Configured): This state is displayed once all the necessary properties are configured for thecomponent.

• (Success): This state is displayed after the successful execution of the analysis.• (Failure): This state is displayed if this component causes the execution of the analysis to fail.

Analysis

An analysis is a series of different components connected together in a particular sequence withconnectors, which define the direction of the data flow.

2012-11-1919


8/9/2019 pa1_0_7_user_en


Model

A model is a reusable component created by training an algorithm using historical data.

In-Database (In-DB)

In-database (in-DB) is an analysis execution mode in which data processing is performed within thedatabase using data mining capabilities. In this mode, the data is never taken out of the database for processing and hence the processing speed is very high. This mode can be used to process large datasets. SAP HANA supports in-DB data mining through R integration and Predictive Analysis Library(PAL).

In-Process (In-Proc)

In-Process is an analysis execution mode in which the data processing is performed by taking data outof the database into the predictive analysis process space. This type of analysis is also referred to asOut-DB analysis.

6.2 Launching SAP Predictive Analysis

To launch SAP Predictive Analysis, choose Start > All Programs > SAP Business Intelligence >SAP Predictive Analysis > SAP Predictive Analysis.

6.3 Understanding SAP Predictive Analysis

When you launch SAP Predictive Analysis, the home page appears. The home page contains informationthat helps you get started with SAP Predictive Analysis.

To start analyzing data using SAP Predictive Analysis, you need to first connect to the data source andacquire data for analysis. After acquiring data, you can perform the following operations on data:

• Prepare data for analysis by applying data manipulation and data cleansing functions

• Analyze data by applying data mining and statistical analysis algorithms

• Share datasets and charts with external collaborators

Note:

This guide describes how to analyze data by applying data mining and statistical analysis algorithms.For information on how to acquire data, prepare data, and share datasets, see the SAP Visual Intelligence

User Guide available at http://help.sap.com/vi.

Once you have acquired data from the data source, you need to switch to the Predict panel to analyzedata.

2012-11-1920




8/9/2019 pa1_0_7_user_en


6.3.1 Designer View

The Designer view enables you to design and run analyses, and to create predictive models.

6.3.2 Results View

The Results view enables you to understand data and analysis results by using various visualizationtechniques and intuitive charts.

2012-11-1921


8/9/2019 pa1_0_7_user_en


6.4 Using SAP Predictive Analysis from Start to Finish

The following is an overview of the process you can follow to build a chart based on a dataset. Theprocess is not a linear one, and you can move from one step back to a preceding step to fine-tune your chart or data.

DescriptionSteps to work with your data

If your data source is:• RDBMS: Enter your credentials, connect to the database server,

browse and select a data source; for example, if you are connectingto SAP HANA, you select a view and cube to build your chart.

• Flat file: Choose the columns to be acquired, trimmed, or shownand hidden.

• Universe: Enter your universe credentials, connect to the CentralManagement Server repository, and select a universe to build your chart.

Connect to your data source.

Note:For information on how toconnect to your data source,see the Connecting to your

data source section of theSAP Visual Intelligence User

Guide .

You can view the data acquired as columns or as facets. You can or-ganize the data display to make chart building easier by doing thefollowing:• Create filters and hide unneeded columns

• Create measures, time hierarchies, and geography hierarchies• Clean and organize the data in columns using a range of manipu-

lation tools

• Create columns with formulas using a wide selection of availablefunctions

View and organize thecolumns and attributes.

Note:

For information on how toview columns and attributes,see the Preparing your data

section of the SAP Visual

Intelligence User Guide .

2012-11-1922


8/9/2019 pa1_0_7_user_en


DescriptionSteps to work with your data

Once you have acquired the relevant data in the Prepare panel, switch

to the Predict panel and create an analysis to find patterns in the dataand predict the future outcomes.

In the Predict panel, you can do the following:

• Create an analysis

• Build predictive models

• View analysis results

• View model visualizations

• Build charts

Note:

For information on building charts, see the Visualizing your data

section of the SAP Visual Intelligence User Guide .

Analyze your data usingpredictive analysis algo-rithms.

Note:This guide provides informa-tion on how to analyze datausing predictive analysis al-gorithms.

Name and save the analysis that includes your charts. Analysis issaved in a document with the file format .SViD in the application folder under Documents in your profile path.

Save your analysis

2012-11-1923


8/9/2019 pa1_0_7_user_en


2012-11-1924


8/9/2019 pa1_0_7_user_en


Building Analyses

7.1 Creating an Analysis

You can use SAP Predictive Analysis to perform data mining and statistical analysis by running datathrough a series of components. The series of components must be connected to each other withconnectors, which define the direction of the data flow. This process is referred to as analysis. Using

analysis, you can read data from a data source, analyze data by applying data manipulation functionsand data mining and statistical algorithms, and store the results of the analysis.

To create an analysis, perform the following steps:

1. Acquire data from a data source

2. (Optional) Prepare the data for analysis (for example, by filtering the data)

3. Apply algorithms

4. (Optional) Store the results of the analysis for further analysis

Related Topics

• Acquiring Data from a Data Source

• Preparing Data for Analysis• Applying Algorithms

• Storing Results of the Analysis

7.1.1 Acquiring Data from a Data Source

1. On the Home page, choose the New Document button in the top left corner.

2. Connect to or browse to your data source.

You can acquire data from the following data sources:

DescriptionData Source

You can acquire data from a comma-separatedvalue data file and perform in-process (in-proc)analysis using SAP and R algorithms.

CSV file

2012-11-1925

Building Analyses

8/9/2019 pa1_0_7_user_en


DescriptionData Source

You can create your own data provider bymanually entering the SQL for a target data

source and perform in-process (in-proc) analysisusing SAP and R algorithms.

Free hand SQL

You can acquire data from SAP HANA tables,views, and analysis views and perform in-pro-cess (in-proc) analysis using SAP and R algo-rithms.

SAP HANA Offline

You can acquire data from SAP HANA tables,views, and analysis views and perform in-database (in-db) analysis using HANA PAL al-gorithms.

SAP HANA Online

You can acquire data from a Microsoft Excelspreadsheet and perform in-process (in-proc)analysis using SAP and R algorithms.

MS Excel

You can acquire data from SAP BusinessOb- jects universes that exists on the XI 3.x platformand perform in-process (in-proc) analysis usingSAP and R algorithms.

Universe 3.x

You can acquire data from SAP BusinessOb- jects universes that exists on the BI 4.x platformand perform in-process (in-proc) analysis usingSAP and R algorithms.

Universe 4.x

3. Choose Acquire or Select as required.

The columns appear in the Data pane, the attributes and measures to the left in the Semantic pane.You are now ready to start building your analysis. In the Predict panel, the configured data reader component is added to the analysis editor. You can run the analysis to see the results of the data reader component.

Note:For information on how to connect to a specific data source, see the SAP Visual Intelligence User Guide

available at http://help.sap.com/vi.

7.1.2 Preparing Data for Analysis

This is an optional step.

2012-11-1926

Building Analyses



8/9/2019 pa1_0_7_user_en


In many cases, the raw data from the data source may not be suitable for analysis. For accurate results,you may need to prepare and process the data before analysis. You can find data manipulation functionsin the Prepare panel and data preparation functions in the Predict panel.

Data preparation involves checking data for accuracy and missing fields, filtering data based on rangevalues, sampling the data to investigate a subset of data, and manipulating data. You can process datausing data preparation components.

1. In the Predict panel, double-click the required data preparation component from the Data Preparation

tab.

The data preparation component is added to the analysis editor and an automatic connection iscreated to the data reader component.

2. Right-click the data preparation component and choose Configure Properties.

3. In the component properties dialog box, enter the necessary details for the data preparationcomponent properties.

4. Choose Save and Close.

5. To view the results of the data reader component and the data preparation component, choose .

Related Topics

• Data Preparation Components

7.1.3 Applying Algorithms

Once you have the relevant data for analysis, you need to apply appropriate algorithms to determinepatterns in the data.

Determining an appropriate algorithm to use for a specific purpose is a challenging task. You can usea combination of a number of algorithms to analyze data. For example, you can first use time seriesalgorithms to smooth data and then use regression algorithms to find trends.

The following table provides information on which algorithm to choose for specific purposes:

2012-11-1927

Building Analyses

8/9/2019 pa1_0_7_user_en


Time Series Algorithms

• Triple Exponential Smoothing

• R-Single Exponential Smoothing• R-Double Exponential Smoothing

• R-Triple Exponential Smoothing

Performing time-based predictions

Regression Algorithms

• Linear Regression

• Exponential Regression

• Geometric Regression

• Logarithmic Regression

• HANA Multiple Linear Regression

• R-Linear Regression

• R-Exponential Regression• R-Geometric Regression

• R-Logarithmic Regression

• R-Multiple Linear Regression

Predicting continuous variables based on other variables in the dataset

Association Algorithms

• HANA Apriori

• R-Apriori

Finding frequent itemset patterns in large transac-tional datasets to generate association rules

Clustering Algorithms

• HANA K-Means

• K-Means

Clustering observations into groups of similar itemsets

Decision Trees

• HANA C 4.5

• R-CNR Tree

Classifying and predicting one or more discretevariables based on other variables in the dataset

Outlier Detection Algorithms

• Inter Quartile Range

• Nearest Neighbour Outlier

Detecting outlying values in the dataset

Neural Network Algorithms

• R-NNet Neural Network

• R-MONMLP Neural Network

Forecasting, classification, and statistical patternrecognition

1. In the Predict panel, double-click the required algorithm component from the Algorithms tab.

2012-11-1928

Building Analyses

8/9/2019 pa1_0_7_user_en


The algorithm component is added to the analysis editor and is connected to the previous componentin the analysis.

2. Right-click the algorithm component and choose Configure Properties.

3. In the component properties dialog box, enter the necessary details for the algorithm componentproperties.


5. To view the results of the data reader component, data preparation component, and algorithm,

choose .

Related Topics

• Algorithms

7.1.4 Storing Results of the Analysis

This is an optional step.

You can store the results of the analysis in flat files or databases for further analysis using data writer components.

1. In the Predict panel, double-click the required data writer component from the Data Writers tab.

The data writer component is added to the analysis editor and is connected to the previous componentin the analysis.

2. Right-click the data writer component and choose Configure Properties.

3. In the component properties dialog box, enter the necessary details for the data writer componentproperties.


5. To view the results of the data reader component, data preparation component, algorithm, and data

writer component, choose .

Related Topics

• Data Writers

7.2 Running the Analysis

2012-11-1929

Building Analyses

8/9/2019 pa1_0_7_user_en


To run the analysis, choose in the analysis editor toolbar or right-click the last component in theanalysis, and choose Run Analysis.

If your analysis is very large and complex, you can run the analysis, component by component and

analyze the data. To run part of the analysis, choose in the analysis editor toolbar or right-click thecomponent up to which you want to run, and choose Run Till Here.

7.3 Saving the Analysis

After creating an analysis, you can save it for reuse in the future. In SAP Predictive Analysis, you needto save the document to save the corresponding analysis. The document is saved in the .SViD fileformat. The saved document contains the dataset (data reader component) you acquired from the data

source and the analysis you created.To save an analysis in a document, perform the following steps:

1. Choose File > Save.

2. Enter a name for the document.

3. Choose Save.

If you create multiple analyses using the same dataset, all of the analyses are saved in the samedocument. You can access all of the analyses in a document through the Change drop-down list.

To add a new analysis to the document, choose in the analysis toolbar. To rename the analysis,

choose and enter a new name. To delete an existing analysis from the document, choose .

Note:Results from the execution of components are not saved with analyses. To view component results,you need to execute the analysis again.

7.4 Viewing Results

To view the results of components in an analysis, after running the analysis, right-click the component,and choose View Results. The Results view is displayed.

2012-11-1930

Building Analyses

8/9/2019 pa1_0_7_user_en


Analyzing Data

After the successful execution of the analysis, the result of each component in the analysis is representedusing different visualization charts.

To analyze data, perform the following steps:

1. After running an analysis, switch to the Results view by choosing the Results button in the toolbar.

2. From the Component Selector pane, choose the required component in the analysis to view itsvisualization.

By default, the result of the component is displayed in the Grid pane. You can switch to the Charts

pane to view the result of the component in the corresponding visualization chart. In addition, you canalso build your own chart in the Visualize pane.

The following table summarizes components and their supported visualization charts.

Visualization ChartsComponents

Scatter Matrix Chart, Statistical Summary Chart, and Parallel Coor-dinates

Data Readers and Data Prepara-tion

Cluster Graph and Algorithm SummaryClustering Algorithms

Decision Tree, Algorithm SummaryDecision Trees

Time Series Graph, Algorithm SummaryTime Series Algorithms

Regression Graph, Algorithm SummaryRegression Algorithms

8.1 Visualization Charts

8.1.1 Scatter Matrix Chart

Scatter matrix charts are matrices of charts (n*n charts, where n is the number of selected attributes)used to compare data across different dimensions. By default, a maximum of four continuous attributesare selected for analysis, starting from the first attribute from the source data, and a 4*4 matrix of charts

2012-11-1931

Analyzing Data

8/9/2019 pa1_0_7_user_en


are plotted. However, you can manually select the required attributes from the Settings option andrefresh the visualization by choosing Apply.

Note:

You can select a maximum of four continuous attributes in the Settings option.

8.1.2 Statistical Summary Chart

Statistical Summary provides summary information for continuous attributes in the data source. Thesummary information includes count, minimum value, maximum value, variance, standard deviation,sum, average, range, and number of records. A histogram chart is plotted for each attribute.

8.1.3 Parallel Coordinates

2012-11-1932

Analyzing Data

8/9/2019 pa1_0_7_user_en


Parallel coordinates is a visualization technique used to visualize multi-dimensional data and multivariatepatterns in the data for analysis.

In this chart, by default, the first five attributes are represented as vertically-spaced parallel axes. To

choose the subset of attributes to be viewed in the chart, use the Settings option. Each axis is labeledwith the attribute name, and minimum and maximum values for attributes. Each observation isrepresented as a series of connected points along the parallel axes. You can select the color by optionto filter the data based on the categorical value.

Note:You can select a maximum of seven continuous attributes in the Settings option.

8.1.4 Decision Tree

A decision tree is a visualization technique that enables you to classify observations into groups andpredict future events based on the set of decision rules.

This presentation is used for decision tree analysis. In this technique, a binary decision tree is built bysplitting observations into smaller sub-groups until the stopping criterion is met. The leaf node indicatesclassified data. You can enlarge the decision tree by choosing the zoom-in button.

Note:

• The application cannot render a decision tree if there are more than 32 categorical values for adependent column.

• The look and feel of the decision tree differs based on the algorithm vendor. For example, the decisiontree for the R-CNR Tree algorithm is different from the decision tree for the HANA C4.5 algorithm.

2012-11-1933

Analyzing Data

8/9/2019 pa1_0_7_user_en


Each node in the decision tree represents the classification of data at that level. You can view node

contents by choosing on each node.

8.1.5 Regression Chart

A regression chart is used to visualize the correlation between the dependent and independent variables.In trend mode, you can analyze the performance of the algorithm by comparing the actual dependentvariables with predicted values, where dependent variables are represented as a bar graph and predictedvalues are represented as a line graph. In fill mode, the algorithm fills the missing values and displaysthe output as a line graph.

2012-11-1934

Analyzing Data

8/9/2019 pa1_0_7_user_en


If the dataset is very large, the graph may be unclear. For better visibility of data, use the Range selector located at the bottom of the graph to select a specific data range from the large dataset. The data inthe selected area is displayed in the visualization editor.

8.1.6 Time Series Chart

A time series chart enables you to visualize time series data in comparison with the fitted or predictedvalues from the algorithm. You can use this chart to view the data forecasted over a specified period.In trend mode, a dependent variable is represented as a bar graph and trend values are representedas a line graph. In predict mode, a dependent variable is represented as a bar graph and predictedvalues are represented as a line graph.

If the dataset is very large, the graph may be unclear. For better visibility of data, use the Range selector located at the bottom of the graph to select a specific data range from the large dataset. The data inthe selected area is displayed in the visualization editor.

2012-11-1935

Analyzing Data

8/9/2019 pa1_0_7_user_en


8.1.7 Cluster Chart

A cluster graph is a visualization technique that uses different charts to represent cluster informationsuch as cluster size, cluster density and distance, cluster variable comparison, and cluster comparison.

Note:If you use the HANA K-Means algorithm to cluster observations, then only cluster size and cluster variable comparison information are represented as charts.

Cluster Size

Cluster size is the number of elements in each cluster and is represented by a horizontal bar chart.

However, you can also visualize the cluster size in a pie chart or a vertical bar chart.

Cluster Density and Distance

The distance between clusters and density of each cluster is represented by a network chart. Eachnode in the network represents a cluster and its size. The color of the node represents density. You

can enlarge the network chart by choosing .

Cluster Variable Comparison

The comparison of the total distribution of all clusters against the distribution of each cluster is representedby a histogram. You can select the required attribute of the cluster from the variable drop-down list andchange the cluster using the slider.

Cluster Comparison

The R-K Means algorithm computes center points for each input attribute in each cluster. The comparisonof each center point and cluster is represented by the radar chart. You can select the Normalize Resultoption to view the chart with the normalized data. In the normalized mode, the data will be representedin the range of 0 to 1.

2012-11-1936

Analyzing Data

8/9/2019 pa1_0_7_user_en


Working with Models

A model is a reusable component created by training an algorithm using historical data and saving theinstance.

Typically, you create models for the following reasons:

• To share computed business rules that can be applied to similar data

• To quickly analyze results without the historical data by using the trained instance of the algorithm

9.1 Creating a Model

To create a model, you need to save the state of the algorithm.

1. Acquire data from the required data source.

The data source component is added to the analysis editor in the Predict panel.

2. In the Predict panel, double-click the required algorithm component.

3. Right-click the algorithm component and choose Configure Properties.

4. Configure the algorithm properties in the dialog box.

a. Enter the necessary values for the algorithm properties.

b. Under Model Information, choose Save the Model.

c. Enter a model name and description.

d. If you want to overwrite the existing model with a new model, select Overwrite, if exists.

e. Choose Save and Close.

5. Choose .

The model is created and appears on the Saved Models tab. You can use this model just like any other component for creating an analysis.

Note:Independent column names used while scoring the model should be the same as independent columnnames used while creating the model.

9.2 Viewing Model Information

2012-11-1937

Working with Models

8/9/2019 pa1_0_7_user_en


Model information includes:

• Column details such as which columns were used while generating the model

• Summary of the algorithm

This information is helpful for data analysts to understand the structure of the model.

To view model information, perform the following steps:

1. In the Predict panel, from the Saved Models tab, double-click the required model.

The Saved Models tab appears only if the models are already saved in the repository.

2. Right-click the model and choose View Model Information.

The corresponding visualization for the algorithm selected while generating the model is displayed.

9.3 Exporting a Model as PMML

You can export the model information into a local file in industry-standard Predictive Modeling MarkupLanguage (PMML) format and share the model with other PMML compliant applications to performanalysis on similar data.

To export a model in PMML format, perform the following steps:

1. Create a model.

2. In the Predict panel, from the Saved Models tab, double-click the required model.

3. Right-click the model and choose Export As PMML.

4. Enter a name for the file.5. Select the file type, either PMML or XML, as required.

6. Choose Save.

9.4 Deleting a Model

We recommend that you use this option with caution, since deleting a model might make the analysisthat contains the model's reference unusable.

To delete a model, perform the following steps:

1. In the Predict panel, choose the Saved Models tab.

2. Hover on the required model and choose the Delete icon.

2012-11-1938

Working with Models

8/9/2019 pa1_0_7_user_en


Use Case Scenarios

This section provides you use case scenarios that describe how you can use SAP Predictive Analysisto analyze data and forecast future events.

10.1 Sales Forecasting

Scenario: The regional manager of an airline company wants to develop strategies to increase businessand fine-tune operations. The airline passengers' data such as flight date and number of passengerstraveled, is stored in a CSV file. The manager would like to analyze the trend in business since 2000and wants to forecast the number of passengers flying in the next year (for example, 2012).

This example assumes that the manager has some basic knowledge in statistical analysis and datamining techniques.

Using SAP Predictive Analysis, the manager creates a forecasting analysis. Since the airline passenger data is seasonal in nature, he selects the Triple Exponential Smoothing algorithm for forecasting.

To create an analysis for forecasting airline passengers, proceed as follows:


2. From the toolbar, choose New Document.

3. Choose CSV.

4. Choose Browse and select the Airline Passenger.csv file.

5. Choose Acquire.

6. Switch to the Predict panel.

7. From the Algorithms tab, double-click the Triple Exponential Smoothing algorithm.

The algorithm component is automatically connected to the data reader component.

8. Right-click the Triple Exponential Smoothing algorithm and choose Configure Properties.

9. In the Triple Exponential Smoothing properties dialog box, provide the necessary details:a. Select Forecast as the output mode, as you want to forecast the data.

b. Select Airline Passenger column as the dependent column. The algorithm forecasts the databased on the Airline Passenger column.

c. In the Missing Values field, select Remove.

d. In the Period field, select Month(12).

e. Enter 2000 as the start year.

f. Enter 1 as a start period. As the period is Month(12), 1 implies first month of the year (January).

2012-11-1939

Use Case Scenarios

8/9/2019 pa1_0_7_user_en


g. Enter 12 for the number of periods to predict.

h. Retain the default values for the advanced properties.

i. Choose Save and Close.

10. From the Data Writers tab, double-click the CSV Writer component.11. Right-click the CSV Writer component and choose Configure Properties.

12. In the CSV Writer properties dialog box, select a CSV file to store the result.


14. Choose to run the analysis.

The fitted and forecast results are stored in the CSV file.

15. Switch to the analysis visualization view.

16. In the Components Selector pane, select Triple Exponential Smoothing.

By default, the results of the component are displayed in the Grid pane.

17. To view the visualization chart, switch to the Charts pane.

18. From the File menu, choose Save.


20. Choose Save.

10.2 Retail Store Segmentation

Scenario: The country manager of a retail chain (which has 150 stores) is finalizing plans for three salespromotion strategies. Data pertaining to stores such as store location, sales turnover, store size, staff,and profit margin is stored in a CSV file. The manager wants to segment 150 stores into three differentgroups based on sales turnover, profit margin, store size, and staff size so that specific strategies canbe applied to each store segment.

This example assumes that the country manager has some basic knowledge in statistical analysis anddata mining techniques.

Using SAP Predictive Analysis, he builds a segmentation analysis by using the R-K-Means algorithm.

To build an analysis for segmentation analysis, proceed as follows:


2. From the toolbar, choose New Document.

3. Choose CSV.4. Choose Browse and select the Retail Stores.csv file.

5. Choose Acquire.

6. Switch to the Predict panel.

7. From the Algorithms tab, double-click the R-K-Means algorithm.

The algorithm component is automatically connected to the data reader component.

8. Right-click the R-K-Means algorithm and choose Configure Properties.

2012-11-1940

Use Case Scenarios

8/9/2019 pa1_0_7_user_en


9. In the R-K-Means properties dialog box, provide the necessary details:

a. Select the columns to be used for cluster analysis.

b. In the Number of Clusters field, enter 3.

c. Retain the default values for the advanced properties.

d. Choose Save and Close.

10. From the Data Writers tab, double-click the CSV Writer component.

11. Right-click the CSV Writer component and choose Configure Properties.

12. In the CSV Writer properties dialog box, select a CSV file to store the result.


14. Choose to run the analysis.

The fitted and forecast results are stored in the CSV file.

15. Switch to the Results view.

16. In the Components Selector pane, select R-K-Means.

By default, the results of the component are displayed in the Grid pane.

17. To view the visualization chart, switch to the Charts pane.

18. From the File menu, select Save.


20. Choose Save.

2012-11-1941

Use Case Scenarios

8/9/2019 pa1_0_7_user_en


2012-11-1942

Use Case Scenarios

8/9/2019 pa1_0_7_user_en


Component Properties

11.1 Algorithms

Use algorithms to perform data mining and statistical analysis on your data. For example, to determinetrends and patterns in data.

SAP Predictive Analysis provides built-in algorithms such as regressions, time series, and outliers.However, the application also supports decision trees, k-means, neural network, time series, andregression algorithms from the open-source R library. You can also perform in-database analysis usingPredictive Analysis Library (PAL) algorithms from SAP HANA.

11.1.1 Regression

11.1.1.1 Exponential Regression

Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using an exponential function withthe least square methodology.

Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.

Exponential Regression Properties

Select the mode in which you want to display the output data.

Possible values:

• Fill: Fills missing values in the target column.

• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.

Output Mode

2012-11-1943


8/9/2019 pa1_0_7_user_en


Select the input source column with which you want to perform regression.Independent Column

Select the target column on which regression needs to be performed.Dependent Column

Select the method for handling missing values.

Possible values:

• Remove: Algorithm skips the records containing missing values in theindependent or dependent column.

• Keep: Retains missing values.

• Stop: Algorithm stops execution- if a value is missing in the independentcolumn or the dependent column.

Missing Values

If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.

Save the Model

Enter a name for the newly created column that contains the predicted

values.

Predicted Column Name

11.1.1.2 Geometric Regression

Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a geometric function with theleast square methodology.


Geometric Regression Properties

Select the mode in which you want to display output data.

Possible values:



Output Mode

Select the input source column with which you want to perform regression.Independent ColumnSelect the target column on which regression needs to be performed.Dependent Column

2012-11-1944


8/9/2019 pa1_0_7_user_en



Possible values:

• Remove: Algorithm skips the records containing missing values in the

independent or dependent columns.• Keep: Retains missing values.

• Stop: Algorithm stops execution-if a value is missing in the independentcolumn or the dependent column.

Missing Values

If you want to save the state of the algorithm, select this option. To save,you need to enter a name and description for the model.

Save the Model

Enter a name for the newly created column that contains predicted values.Predicted Column Name

11.1.1.3 HANA Multiple Linear Regression

Use this algorithm to find the linear relationship between a dependent variable and one or moreindependent variables.

HANA Multiple Linear Regression Properties


Possible values:


• Trend: Predicts the values for the dependent column and adds anextra column in the output containing the predicted values.

Output Mode

Select the input source columns with which you want to performregression.

Independent Columns

Select the target column on which you want to perform regression.Dependent Column


Possible values:

• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.

• Keep: Retains missing values.• Stop: Algorithm stops execution-if a value is missing in the independent

column or the dependent column.

Missing Values

Enter the number of threads that can be used for execution.Number of Threads

Iff you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.

Save the Model

2012-11-1945


8/9/2019 pa1_0_7_user_en


Enter a name for the newly created column that contains the predictedvalues.


11.1.1.4 Linear Regression

Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable with the least square methodology.


Linear Regression Properties


Possible values:



Output Mode




Possible values:




Missing Values


Save the Model

Enter a name for the newly created column that contains the predicted

values.


11.1.1.5 Logarithmic Regression

2012-11-1946


8/9/2019 pa1_0_7_user_en


Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a logarithmic function with theleast square methodology.


Logarithmic Regression Properties


Possible values:



Output Mode

Select the input source column with which you want to perform regression.Independent ColumnSelect the target column on which you want to perform regression.Dependent Column


Possible values:




Missing Values

If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.Save the Model



11.1.1.6 R-Exponential Regression

Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using an exponential function fromthe R open-source library.


2012-11-1947


8/9/2019 pa1_0_7_user_en


R-Exponential Regression Properties


Possible values:



Output Mode




Possible values:




Missing Values

A Boolean valueif set to true, the aliased coefficients are ignored in thecoefficient covariance matrix. If set to false, a model with aliased coefficientsproduces an error.

A model with aliased coefficients signifies that the square matrix x*x issingular.

Allow Singular Fit

Select the list of contrasts to be used for factors appearing as variables in

the model.

Contrasts

If you want to save the state of the algorithm, select this option. To save,you need to enter a name and description for the model.

Save the Model


Predicted Column

Name

11.1.1.7 R-Geometric Regression

Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a geometric function from theR open-source library.


2012-11-1948


8/9/2019 pa1_0_7_user_en


R-Geometric Regression Properties


Possible values:



Output Mode




Possible values:




Missing Values

A Boolean value - if set to true, the aliased coefficients are ignored in thecoefficient covariance matrix. If set to false, a model with aliased coefficientsproduces an error.


Allow Singular Fit


the model.

Contrasts


Save the Model


Predicted Column

Name

11.1.1.8 R-Linear Regression

Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable by using the R open-source library.


2012-11-1949


8/9/2019 pa1_0_7_user_en


R-Linear Regression Properties


Possible values:



Output Mode




Possible values:



• Stop: Algorithm stops execution - if a value is missing in the independentcolumn or the dependent column.

Missing Values

A Boolean value - if set to true, the aliased coefficients are ignored in thecoefficient covariance matrix. If set to false, a model with aliased coefficientsproduces an error.


Allow Singular Fit


the model.

Contrasts


Save the Model


Predicted Column

Name

11.1.1.9 R-Logarithmic Regression

Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a logarithmic function from theR open-source library.


2012-11-1950


8/9/2019 pa1_0_7_user_en


8/9/2019 pa1_0_7_user_en


R-Multiple Linear Regression Properties


Possible values:



Output Mode

Select the input source columns with which you want to perform regression.Independent Columns

Select the target column on which regression needs to be performed.Dependent Column


Possible values:



• Stop: Algorithm stops execution - if a value is missing in theindependent column or the dependent column.

Missing Values

Enter the confidence level of the algorithm (the accuracy of predictions).Confidence Level


Save the Model



11.1.2 Outliers

11.1.2.1 Inter Quartile Range

Use this algorithm to find outlying values based on the statistical distribution between the first and thirdquartiles.

Note:The input data for the IQR algorithm must be at least 4 rows.

2012-11-1952


8/9/2019 pa1_0_7_user_en


Inter Quartile Range Properties


Possible values:

• Show Outliers: Adds a Boolean column to the input data specifying if thecorresponding value is an outlier.

• Remove Outliers: Removes outlying values from the input data.

Output Mode

Select the input source column.Independent Column


Possible values:




Missing Values

Enter the deviation allowed for values from the inter quartile range.Fence Coefficient

11.1.2.2 Nearest Neighbor Outlier

Use this algorithm to find outlying values based on the number of neighbors (N) and the average distanceof values compared to their nearest N neighbors.

Nearest Neighbour Outlier Properties


Possible values:

• Show Outliers: Adds a Boolean column to the input data specifyingif the corresponding value is an outlier.

• Remove Outliers: Removes outlying values from the input data.

Output Mode

Select the input source column.Independent Column

Select the method for handling missing values.Possible values:

• Remove: Algorithm skips the records containing missing values inthe independent or dependent columns.



Missing Values

2012-11-1953


8/9/2019 pa1_0_7_user_en


Enter the deviation allowed for values from the inter quartile range.Neighborhood Count

Enter the number of outliers to be removed.Number of Outliers

Enter a name for the new column that contains the predicted values.Predicted Column Name

11.1.3 Time Series

11.1.3.1 Triple Exponential Smoothing

Use this algorithm to smooth the source data and find seasonal trends in data.

Triple Exponential Smoothing Properties

Select the mode in which you want to display the output.

• Trend: Displays source data along with predicted values for the givendataset.

• Forecast: Displays forecasted values for the given time period.

Output Mode

Select the input column to be forecasted.Dependent Column

Select this option to specify whether to use the date column.Consider Date Column

Enter the name of the column that contains date values.Date Column

Select the method to handle missing entries.

• Remove: Algorithm skips the records containing missing values inthe independent column or the dependent column.


Missing Values

Select the period for forecasting.Period

Select the periods for forecasting. This option is only enabled if youselect "Custom" for "Period".

Periods Per Year

Enter the year from which the observations are to be considered. For example, 2009, 1987, 2019.

Start Year

Enter the period from which the observations are to be considered.Start Period

Enter the number of periods to predict.Periods to Predict


Save the Model

2012-11-1954


8/9/2019 pa1_0_7_user_en




Enter a name for the newly created column that contains year values. Year Values

Enter a name for the newly created column that contains quarter values.Quarter Values

Enter a name for the newly created column that contains month values.Month Values

Enter a name for the newly created column that contains period values.Period Values

Enter a smoothing constant for smoothing observations (baseparameters). Range: 0-1.

Alpha

Enter a smoothing constant for finding trend parameters. Range: 0-1.Beta

Enter a smoothing constant for finding seasonal trend parameters.Range: 0-1.

Gamma

11.1.3.2 R-Double Exponential Smoothing

Use this algorithm to smooth the source data and find trends in data.

R-Double Exponential Smoothing Properties

Select the mode in which you want to display the output.

• Trend: Displays source data along with predicted values for thegiven dataset.


Output Mode



Select the periods for forecasting. This option is only enabled if youselect "Custom" for "Period".

Periods Per Year


Start Year



If you want to save the state of the algorithm, select this option. Tosave, you need to enter a name and a description for the model.

Save the Model




2012-11-1955


8/9/2019 pa1_0_7_user_en






Alpha



Enter the number of periodic observations required to start thecalculation.

No. Periodic Observations

Enter the start value for level (a[0]) (l.start). For example: 0.4Level

Enter the start value for finding trend parameters (b[0]) (b.start). For example: 0.4

Trend

Enter the starting values for alpha, beta, and gamma required for theoptimizer. For example: 0.3, 0.1, 0.1

Optimizer Inputs

11.1.3.3 R-Single Exponential Smoothing

Use this algorithm to smooth the source data.

R-Single Exponential Smoothing PropertiesSelect the mode in which you want to display the output.

• Trend: Displays source data along with predicted values for thegiven dataset.


Output Mode



Select the period for forecasting. This option is only enabled if youselect "Custom" for "Period".

Periods Per Year


Start Year




Save the Model

2012-11-1956


8/9/2019 pa1_0_7_user_en





Enter a name for the newly created column that contains quarter values.

Quarter Values




Alpha


Enter the number of periodic observations required to start thecalculation.



11.1.3.4 R-Triple Exponential Smoothing

Use this algorithm to smooth source data and find seasonal trends in data.

R-Triple Exponential Smoothing Properties

Select the mode in which you want to display the output.• Trend: Displays source data along with predicted values for the given

dataset.


Output Mode



Select the period for forecasting. This option is only enabled if you select"Custom" for "Period".

Periods Per Year


Start Year




Save the Model



2012-11-1957


8/9/2019 pa1_0_7_user_en







Alpha


Enter a smoothing constant for finding seasonal trend parameters.Gamma

Select the type of HoltWinters Exponential Smoothing algorithm.Seasonal


Enter the number of periodic observations required to start the

calculation.



Enter the start value for finding trend parameters (b[0]) (b.start). For example: 0.4

Trend

Enter start values for finding seasonal parameters (s.start). This valueis dependent on the column you select. For example, if you select quarter as period, you need to provide four double values.

Season

Enter the starting values for alpha, beta, and gamma required for theoptimizer. For example: 0.3, 0.1, 0.1

Optimizer Inputs

11.1.4 Decision Trees

11.1.4.1 HANA C 4.5

Use this algorithm to classify observations into groups and predict one or more discrete variables basedon other variables.

2012-11-1958


8/9/2019 pa1_0_7_user_en


HANA C 4.5 Properties


Possible values:

• Trend: Predicts the values for the dependent column and adds anextra column in the output containing the predicted values.


Output Mode

Select input source columns.Independent Columns

Select the target column.Dependent Column


Possible values:

• Remove: Algorithm skips the records containing missing values inthe independent or dependent columns.



Missing Values

Enter the percentage of data to be considered for analysis.Percentage

Enter the number of threads to be used for execution.Number of Threads

Enter the name of the independent column containing numerical values.Column Name

Enter bin ranges.Enter Bin Ranges

11.1.4.2 R-CNR Tree

Use this algorithm to classify observations into groups and predict one or more discrete variables basedon other variables. However, you can also use this algorithm to find trends in data.

Note:

• The "rpart" package which is part of R 2.11.1 cannot handle column names with spaces or specialcharacters. The "rpart" package supports only the input column name format that is supported byR dataframe.

• CNR tree doesn't work if the version of the caret package which is part of R is less than 4.85.• Independent column names used while scoring the model should be same as independent column

names used while creating the model.

• Column names containing spaces or any other special character other than period (.) are notsupported.

2012-11-1959


8/9/2019 pa1_0_7_user_en


R-CNR Tree Properties


Possible values:



Output Mode




Possible values:

• Rpart: Algorithm deletes all observations for which the dependent columnis missing. However, it retains those observations for which one or more

independent columns are missing.• Remove: Algorithm skips the records containing missing values in the

independent columna or the dependent column.


• Stop: Algorithm stops execution - if a value is missing in the independentcolumn or the dependent column.

Missing Values

Select the splitting rule type.

Possible values:

• Classification: Use this method - if the dependent variable has categoricalvalues.

• Regression: Use this method - if the dependent variable has continuousvalues.

Method

Enter the minimum number of observations required for splitting a node.Minimum Split

Select the splitting criteria of the node.

Possible values:

• Gini: Gini impurity.

• Information: Information gain.

Split Criteria

If you want to save the state of the algorithm, select this option. To save, youneed to enter a name and a description for the model.

Save the Model

Enter a name for the newly created column that contains the predicted values.Predicted Column

Name

Enter the complexity parameter that saves computing time by preventingany split that does not improve the fit.

Complexity Parameter

2012-11-1960


8/9/2019 pa1_0_7_user_en


Enter the maximum node level in the final tree with the root node countedas level 0.

Note:

If the maximum depth is greater than 30, the algorithm does not produceresults as expected (on 32-bit machines).

Maximum Depth

Enter the number of cross validations. A higher cross validation valueincreases the computational time and produces more accurate results.

Cross Validation

Enter the vector of prior probabilities.Prior Probability

Select the surrogate to use in the splitting process.

Possible values:

• Display Only - an observation with a missing value for the primary splitrule is not sent further down the tree.

• Use Surrogate - use this option to split subjects missing the primaryvariable; if all surrogates are missing, the observation is not split.

• Stop if missing - If all surrogates are missing, sends the observation inthe majority direction.

Use Surrogate

Enter the style that controls the selection of the best surrogate.

Possible values:

• Use total correct classification - algorithm uses total number of correctclassifications to find a potential surrogate variable.

• Use percent non missing cases - algorithm uses the percentage of nonmissing cases classified to find a potential surrogate.

Surrogate Style

Enter the maximum number of surrogates to be retained at each node in atree.

Maximum Surrogate

11.1.5 Neural Network

11.1.5.1 R-MONMLP Neural Network

Use this algorithm for forecasting, classification, and statistical pattern recognition using R libraryfunctions.

2012-11-1961


8/9/2019 pa1_0_7_user_en


Note:R does not support PMML storage for MONMLP Neural Network.

R-MONMLP Neural Network Properties


Possible values:

• Trend: Predicts the values for the dependent column andadds an extra column in the output containing the predictedvalues.


Output Mode



Enter the number of nodes/neurons in the first hidden layer

(hidden1).

Hidden Layer1 Neurons

If you want to save the state of the algorithm, select this option.To save, you need to enter a name and a description for themodel.

Save the Model

Enter a name for the newly created column that contains thepredicted values.


Select the activation function to be used for the hidden layer (Th).

Hidden Layer Transfer Function

Select the activation function to be used for the output layer (To).Output Layer Transfer Function

Select the derivative of the hidden layer activation function(Th.prime).

Derivative of Hidden Layer

Transfer Function

Select the derivative of the output layer activation function(To.prime).

Derivative of Output Layer

Transfer Function

Enter the number of nodes/neurons in the second hidden layer (hidden2).

Hidden Layer2 Neurons

Enter the maximum number of iterations for the optimizationalgorithm (iter.max).

Maximum Iterations

Enter column indexes to which you want to apply the monotonicityconstraint (monotone).

Monotone Columns

Enter the number of training iterations after which the costfunction calculation stops (iter.stopped).

Training Iterations

Enter an initial weight vector (init.weights).Initial Weights

Enter the maximum number of exceptions for the optimizationroutine (max.exceptions).

Maximum Exceptions

To scale dependent columns to zero mean and unit varianceprior to fitting, select True (scale.y).

Scale Dependent Column

2012-11-1962


8/9/2019 pa1_0_7_user_en


To use bootstrap aggregation, select True (bag).Bagging Required

Enter the number of repeated trials to avoid local minima(n.trials).

Trials to Avoid Local Minima

Enter the number of ensemble members to fit (n.ensemble).No. Ensemble Members

11.1.5.2 R-NNet Neural Network

Use this algorithm for forecasting, classification, and statistical pattern recognition using R libraryfunctions.

R-NNet Neural Network PropertiesSelect the mode in which you want to display the output data.

Possible values:



Output Mode



Select the method for handling missing values.Missing Values

Possible values:

• Remove: The algorithm skips the records containing missing valuesin the independent or dependent columns.

• Keep: The algorithm retains missing values for processing.

• Stop: The algorithm stops if a value is missing in the independentcolumn or the dependent column.

Enter the number of nodes/neurons in the hidden layer.Hidden Layer Neurons



Select the type of analysis to be done by the algorithm.Type

To add skip-layer connections from input to output, select True.Skip Hidden Layer

To obtain the linear output, select True. If you select the analysis typeClassification, this value must be true.

Linear Output

Select True to use "log-linear model" and "maximum conditional likelihood"fittings.

linout, entropy, softmax, and censored are mutually exclusive.

Use Softmax

2012-11-1963


8/9/2019 pa1_0_7_user_en


To use "Maximum Conditional Likelihood" fitting, select True. By default,the algorithm uses the least-squares method.

Possible values:

• True: Use the "Maximum Conditional Likelihood" fitting• False: Use the least-squares method

Use Entropy

For softmax, a row of (0,1,1) indicates one example each of classes 2and 3, but for censored it indicates one example each of classes 2 or 3.

Use Censored

Enter initial random weights [-rang, rang]. Set this value to 0.5 unless theinput is large. If the input is large, choose the rang using the formula: rang* max(|x|) <= 1

Range

Enter a value used for calculating new weights (weight decay).Weight Decay

Enter the maximum number of iterations allowed.Maximum IterationsTo return the Hessian measure at the best set of weights, select True.Hessian Matrix Required

Enter the maximum number of weights allowed in the calculation.

There is no intrinsic limit in the code, but increasing the maximum number of weights may allow fits that are very slow and time-consuming.

Maximum Weights

Enter the value that indicates the perfect fit (abstol).Abstol

Algorithm terminates if the optimizer is unable to reduce the fit criterionby a factor: 1 - reltol

Reltol

Enter the list of contrasts to be used for factors appearing as variables in

the model.

Contrasts


Save the Model

11.1.6 Clustering

11.1.6.1 HANA K-Means

Use this algorithm to cluster observations into groups of related observations without any prior knowledgeof those relationships. The algorithm clusters observations into k groups, where k is provided as an

2012-11-1964


8/9/2019 pa1_0_7_user_en


input parameter. The algorithm then assigns each observation to clusters based on the proximity of theobservation to the mean of the cluster. The process continues until the clusters converge.

Note:

• You might obtain a different cluster number for each cluster each time you execute the HANAK-Means algorithm. However, the observations in each cluster remain the same.

• Creating models using the HANA K-Means algorithm is not supported.

HANA K-Means Properties

Select the mode in which you want to display the output data.Output Mode

Select the input source columns.Independent Columns


Possible values:

• Remove: Algorithm skips the records containing missing values

in the independent or dependent columns.• Ignore: Algorithm ignores the record containing missing values

during calculation. However, the records are retained in the resulttable.

• Stop: Algorithm stops if a value is missing in the independentcolumn or the dependent column.

Missing Values

Enter the number of groups for clustering.Number of Clusters

Enter a name for the newly created column that contains the cluster name.

Cluster Name

Enter the number of iterations allowed for finding clusters.Maximum Iterations

Select the method to be used for calculating initial cluster centers.Center Calculation Method

To normalize the data, select True.Normalization

Enter the number of threads that can be used for execution.Number of Threads

Enter the threshold value for exiting from the iterations.Exit Threshold

11.1.6.2 R-K-Means

Use this algorithm to cluster observations into groups of related observations without any prior knowledgeof those relationships. The algorithm clusters observations into k groups, where k is provided as aninput parameter. The algorithm then assigns each observation to clusters based on the proximity of theobservation to the mean of the cluster. The process continues until the clusters converge.

2012-11-1965


8/9/2019 pa1_0_7_user_en


Note:

• You might obtain a different cluster number for each cluster each time you execute the R-K-Meansalgorithm. However, the observations in each cluster remain the same.

• Creating models using the R-K-Means algorithm is not supported.

R-K-Means Properties

Select the mode in which you want to display the output data.Output Mode

Select the input source columns.Independent Columns

Enter the number of groups for clustering.Number of Clusters

Enter a name for the newly created column that contains thecluster name.

Cluster Name

Enter the number of iterations allowed for finding clusters.Maximum Iterations

Enter the number of random initial sets for clustering (n start).Number of Initial Sets

Select the type of algorithm to be used for performing K-Meansclustering.

Algorithm

11.1.7 Association

11.1.7.1 HANA Apriori

Use this algorithm to find frequent itemsets patterns in large transactional datasets for generatingassociation rules. This algorithm is used to understand what products and services customers tend topurchase at the same time. By analyzing the purchasing trends of customers with association analysis,you can predict their future behavior.

For example, the information that a customer who buys shoes is more likely to buy socks at the sametime can be represented in an association rule (with a given minimum support and minimum confidence)as: Shoes=> Socks [support = 0.5, confidence= 0.1]

HANA Apriori Properties

Select the columns containing the items to which you want to apply thealgorithm.

Item Column(s)

Select the column containing the transaction IDs to which you want toapply the algorithm.

TransactionID Column

2012-11-1966


8/9/2019 pa1_0_7_user_en



Possible values:

• Remove: The algorithm skips the records containing missing values

in the independent or dependent columns.• Keep: The algorithm retains missing values for processing.

Missing Values

Enter a value for the minimum support of an item.Support

Enter a value for the minimum confidence of rules/association.Confidence

Enter a name for the new column that contains the antecedent (LHS) of the apriori rule for the given dataset.

Pre Rule

Enter a name for the new column that contains the consequent (RHS)of the apriori rule for the given dataset.

Post Rule

Enter a name for the new column that contains the support for the

corresponding rules.

Support Values

Enter a name for the new column that contains the confidence valuesfor the corresponding rules.

Confidence Values

Enter a name for the new column that contains the lift values for thecorresponding rules.

Lift values

Enter the number of threads to be used for execution.Number of Threads

11.1.7.2 R-Apriori

Use this algorithm to find frequent itemsets patterns in large transactional datasets for generatingassociation rules using the "arules" R package. This algorithm is used to understand what productsand services customers tend to purchase at the same time. By analyzing the purchasing trends of customers with association analysis, prediction of their future behavior can be made.

For example, the information that a customer who buys shoes is more likely to buy socks at the sametime can be represented in an association rule (with a given minimum support and minimum confidence)as: Shoes=> Socks [support = 0.5, confidence= 0.1]

R-Apriori Properties

Select the mode to display the output.Output Mode

Select the format of the input data.Input Format

Select the columns containing the items to which you want to applythe algorithm.

Item Column(s)

Select the column containing the transaction IDs to which you want toapply the algorithm.

TransactionID Column

2012-11-1967


8/9/2019 pa1_0_7_user_en


Enter a value for the minimum support of an item.Support

Enter a value for the minimum confidence of rules/association.Confidence


Save the Model

Enter a name for the new column that contains the apriori rules for thegiven dataset.

Rules

Enter a name for the new column that contains the support for thecorresponding rules.

Support Values

Enter a name for the new column that contains the confidence valuesfor the corresponding rules.

Confidence Values

Enter a name for the new column that contains the lift values for thecorresponding rules.

Lift values

Enter a name for the new column that contains transaction ID.Transaction IDEnter a name for the new column that contains the names of the items.Items

Enter a name for the new column that contains the matching rules.Matching Rules

Enter comma-separated labels for the items that appear on the lefthand side of rules or itemsets.

Lhs Item(s)

Enter comma-separated labels for the items that appear on the righthand side of rules or itemsets.

Rhs Item(s)

Enter comma-separated labels for the items that appear on both sidesof rules or itemsets.

Both Item(s)

Enter a comma-separated labels of the items which need not appear

in the rules or itemsets.

None Item(s)

Enter default appearance of items that are not explicitly mentioned.Default Appearance

Select the sort option to sort items by their frequency.Sort Items

Enter a numerical value that indicates how to filter unused items fromtransactions.

Filter Items

To organize transactions as a prefix tree, select True.Tree View

To use heap sort instead of quick sort to sort transactions, select True.Use HeapSort

To minimize memory usage instead of maximizing speed, select True.Minimize Memory

To load transactions into memory, select True.Load Transaction

11.1.8 Classification

2012-11-1968


8/9/2019 pa1_0_7_user_en


11.1.8.1 HANA KNN

Use this component to classify objects based on the trained sample data. In KNN, objects are classifiedby the majority votes of its neighbors.

HANA KNN Properties


Enter the number of neighbors to consider for finding distances.Neighborhood Count

Select the voting type.Voting Type


• Remove: The algorithm skips the records containing missingvalues in the independent or dependent columns.

• Keep: The algorithm considers missing values for processing.

• Stop: The algorithm stops the execution if a value is missing inthe independent column or the dependent column.

Missing Values

Enter the schema that contains the trained data.Schema Name

Enter the table that contains the trained data.Table Name

Enter input columns to be considered for training data.Independent Columns

Enter the output column to be considered for training data.Dependent Column

Enter the number of threads to be used for execution.Number of ThreadsEnter a name for the new column that contains the classificationvalues.


11.2 Data Preparation Components

Use data preparation components to prepare the data for analysis. These are optional components.

11.2.1 Formula

Use this component to apply predefined functions and operators on the data. All functions andexpressions except data manipulation functions add a new column with the formula result.

2012-11-1969


8/9/2019 pa1_0_7_user_en


Note:

• When entering a string literal that contains single quotation marks, each single quotation mark insidethe string literal must be escaped with a backslash character. For example, enter 'Customer's' as

'Customer\'s'.• When entering a column name that contains square brackets, each square bracket inside the columnname must be escaped with a backslash character. For example, enter [Customer[Age]] as[Customer\[Age\]].

Formula Properties

Enter a name for the new column created by applying the formula.Name

Enter the formula you want to apply. For example, Average([Age]).Expression

Example: Calculating average age of employees

Employee Table:

Date of Confir-mationDate of Joining AgeDOBEmp NameEmp ID

27/11/200512/9/20052511/11/1986Laura1

10/7/200024/6/20003012/5/1981Desy2

24/12/199810/10/19983330/5/1978 Alex3

20/12/19992/12/1999326/6/1979John4

1. Drag the Formula component onto the analysis editor.

2. In the properties view, enter a name for the formula.

For example, Average_Age.

3. In the Expression field, enter the formula: AVERAGE([Age])

4. Choose Validate and Apply to validate the formula syntax.

Output table:

2012-11-1970


8/9/2019 pa1_0_7_user_en


Average_AgeDate of Con- firmation

Date of Join-ing AgeDOBEmp NameEmp ID

3027/11/200512/9/20052511/11/1986Laura1

3010/7/200024/6/20003012/5/1981Desy2

3024/12/199810/10/19983330/5/1978 Alex3

3020/12/19992/12/1999326/6/1979John4

Supported Functions

DescriptionFunction (Function when applied on theEmployee table)Category

Returns the number of days between twodates.

DAYSBETWEENDate

Returns the current system date.CURRENTDATE

Returns the number of months between twodates.

For example, the new column contains2,0,2,0 when MONTHSBETWEEN([Date of Joining],[Date of Confirmation]) is applied tothe Employee table.

MONTHSBETWEEN

Returns the day name in string format.

For example, the new column containsMonday, Saturday, Saturday, Thursday whenDAYNAME([Date of Joining]) is applied tothe Employee table.

DAYNAME

Returns the day number of the particular month.

For example, 12/11/1980 returns 12.

DAYNUMBEROFMONTH

Returns the day number in a week.

For example, Sunday =1, Monday=2.

DAYNUMBEROFWEEK

Returns the day number in a year.

For example, 1st Jan =1, 1st Feb=32, 3rdFeb=34.

DAYNUMBEROFYEAR

Returns the date of the last day in a week.

For example, 12/9/2005 returns 17/9/2005

LASTDATEOFWEEK

2012-11-1971


8/9/2019 pa1_0_7_user_en



Returns the date of the last day in a month.


LASTDATEOFMONTH

Returns the month number in a date.

For example, Jan=1, Feb=2, Mar=3

MONTHNUMBEROFYEAR

Returns the week number in a year.


WEEKNUMBEROFYEAR

Returns the quarter number in a date.


QUARTERNUMBEROFDATE

Concatenates two strings.

For example, CONCAT('USA', 'Australia')returns USAAustralia.

CONCATString

Returns true - if the search string is found inthe source string.

For example, INSTRING('USA', 'US') returnstrue.

INSTRING

Returns a substring from the source string.

For example, SUBSTRING('USA', 1,2) re-

turns US.

SUBSTRING

Returns the number of characters in thesource string. For example, STRLEN('Aus-tralia') returns 9.

STRLEN

Returns the maximum value in a column.MAXMath

Returns the minimum value in a column.MIN

Returns the number of values in a column.COUNT

Returns the sum of the values in a column.SUM

Returns the average of the values in a col-umn. AVERAGE

Performs in-place replacement of a string.

For example, @REPLACE([country],'USA','AMERICA') replaces USA with AMERICA inthe country column.

@REPLACEData Ma-nipulation

2012-11-1972


8/9/2019 pa1_0_7_user_en



Replaces blank values with a specified value.

For example, @BLANK([country], 'USA') re-places all blank values with USA in thecountry column.

@BLANK

Selects rows that satisfy the given condition.You can use any conditional operator tospecify the condition.

For example, @SELECT([country]=='USA')selects rows where country is equal to USA.

@SELECT

Checks whether the condition is met, andreturns one value if 'true' and another valueif 'false'.

For example, IF([Date of Joining]>12/9/2005)THEN ('Employee joined after Sept 12, 2005')ELSE ('Employee joined on or before Sept12, 2005')

IF(condition) THEN(string expression/mathe-matical expression/conditional expression)ELSE(string expression/mathematical expres-sion/conditional expression)

Condition-al Expres-sion

Note:Mathematical expressions containing functions that return a numerical value are not supported. For example, expression DAYNUMBEROFMONTH(CURRENTDATE())+2 is not supported becauseDAYNUMBEROFMONTH returns a numerical value.

Mathematical Operators

Use mathematical operators to create formulas containing numerical columns and/or numbers. For example, the expression [Age] + 1 adds a new column with values 26, 31, 34, 33.

DescriptionMathematical Operators

Addition operator +

Subtraction operator -

Multiplication operator *

Division operator /

Round brackets or parenthesis()

2012-11-1973


8/9/2019 pa1_0_7_user_en


DescriptionMathematical Operators

Power operator ^

Modulo operator %

Exponential operator E

Conditional Operators

Use conditional operators to create IF THEN ELSE or SELECT expressions.

DescriptionConditional Operators

Equal to==

Not equal to!=

Less than<

Greater than>

Less than or equal to<=

Greater than or equal to>=

Logical Operators

Use logical operators to compare two conditions and return 'true' or 'false'. For example, IF([Date of Joining]>12/9/2005 && [Age] >=25 ) THEN ('True') ELSE ('False') adds a new column with values True,False, False, False.

DescriptionLogical Operators

AND&&

OR||

11.2.2 Sample

Use this component to select a subset of data from large datasets.The Sample component supports the following sample types:

• First N: Selects the first N records in the dataset.

• Last N: Selects the last N records in the dataset.

• Every N: Selects every Nth record in the dataset, where N is an interval. For example, if N=2, the2nd, 4th, 6th, and 8th records are selected and so on.

• Simple Random: Randomly selects records of size N or N percent of records in a dataset.

2012-11-1974


8/9/2019 pa1_0_7_user_en


• Systematic Random: In this sample type, sample intervals or buckets are created based on thebucket size. The Sample component selects the Nth record at random from the first bucket, andfrom each subsequent bucket the Nth record is selected.

Sample Properties

Select the type of sampling.Sampling Type

Select the method for limiting the rows.Limit Rows by

Enter the number of rows to be selected.Number of Rows

Enter the percentage of rows to be selected.Percentage of Rows

Enter the bucket size within which a random row is selected.Bucket Size

Enter the interval between rows to be selected.Interval

Enter the maximum number of rows to be selected.Maximum Rows

Example: Selecting subset of data from a given dataset

AgeDOBEmp NameEmp ID

2511/11/1986Laura1

3012/5/1981Desy2

3330/5/1978 Alex3

326/6/1979John4

244/7/1987Ted5

4130/6/1970Tom6

4624/6/1965 Anna7

216/7/1990Valerie8

2619/9/1985Mary9

2521/11/1986Martin10

1. First N: For N=5

2012-11-1975


8/9/2019 pa1_0_7_user_en



2511/11/1986Laura1

3012/5/1981Desy2

3330/5/1978 Alex3

326/6/1979John4

244/7/1987Ted5

2. Last N: For N=4


4624/6/1965 Anna7

216/7/1990Valerie82619/9/1985Mary9

2521/11/1986Martin10

3. Every N: Interval=3


3330/5/1978 Alex3

4130/6/1970Tom6

2619/9/1985Mary9

4. Simple Random: For number of rows=2

The result can be any two rows.


4624/6/1965 Anna7

216/7/1990Valerie8

5. Systematic Random: Bucket Size=4

2012-11-1976


8/9/2019 pa1_0_7_user_en



3012/5/1981Desy2

4130/6/1970Tom6

2521/11/1986Martin10

or


2511/11/1986Laura1

244/7/1987Ted5

2619/9/1985Mary9

11.2.3 Data Type Definition

Use this component to change the name, data type, and date format of the source column. Definingthe data type helps you to prepare data to make it suitable for further analysis.

For example,

• If the name of the column in the data source is "des", it may not be clear during analysis. You canchange the name of the column to "Designation" in the analysis, so that the end users can easilyunderstand it.

• If the date is stored in the mmddyy (120201, without any date separator) format, it may be consideredas an integer value by the system. Using the Data Type Definition component, you can change thedate format to any valid format such as mm/dd/yyyy, or dd/mm/yyyy, and so on.

To change the name, data type, and the date format of the source column, perform the following steps:

1. Add the data type definition component into the analysis.

2. Right-click the component and choose Configure Properties.

3. To change the column name, enter an alias name for the required source column.

4. To change the data type of the column, select the required data type for the source column.

11.2.4 Filter

Use this component to filter rows and columns based on a specified condition.

2012-11-1977


8/9/2019 pa1_0_7_user_en


Note:

• The In-DB Filter component does not support functions and advanced expressions.

• If you change the data source after configuring the filter component, the filter component still retains

the previously defined row filters.

Filter Properties

Select columns for analysis.Selected Columns

Enter the filter condition.Filter Condition

Example: Filter "Store" column from the source data and apply "Profit >2000" condition.

ProfitRevenueStore

100010000Land Mark

450020000Spencer

800025000Soch

1. Uncheck the "Store" column from the Selected Columns.

2. In the Row Filter pane, choose the Profit column.

3. In the Select from Range option, enter 2000 in the From text box. The To text box should beempty.

4. Choose OK.


6. Execute the analysis.

ProfitRevenue

450020000

800025000

Note:The Filter component only supports expressions that return Boolean result.

For example, in the Employee table below:

2012-11-1978


8/9/2019 pa1_0_7_user_en


Date of Confir-mationDate of Joining AgeDOBEmp NameEmp ID

27/11/200512/9/20052511/11/1986Laura1

10/7/200024/6/20003012/5/1981Desy2

24/10/199810/10/19983330/5/1978 Alex3

20/12/19992/12/1999326/6/1979John4

• The expression DAYSBETWEEN([Date of Joining],[Date of Confirmation]) is not a valid filter expression since it returns a numerical value. The correct usage of the DAYSBETWEEN expressionin filter is DAYSBETWEEN([Date of Joining],[Date of Confirmation]) == 14. This expression selectsthose rows where number of days between "Date of Joining" and "Date of Confirmation" is 14. For the employee table above, the third row is selected.

• DAYNAME([Date of Joining]) == 'Saturday' selects the second and third rows in the employee table.

Note:

• When entering a string literal that contains single quotation marks, each single quotation mark insidethe string literal must be escaped with a backslash character. For example, enter 'Customer's' as'Customer\'s'.

• When entering a column name that contains square brackets, each square bracket inside the columnname must be escaped with a backslash character. For example, enter [Customer[Age]] as[Customer\[Age\]].

Supported Functions

Note:

The Filter component does not support data manipulation functions.

2012-11-1979


8/9/2019 pa1_0_7_user_en



Returns the number of days between two

dates.

DAYSBETWEENDate

Returns the current system date.CURRENTDATE

Returns the number of months between twodates.

For example, the new column contains2,0,2,0 when MONTHSBETWEEN([Date of Joining],[Date of Confirmation]) is applied tothe Employee table.

MONTHSBETWEEN

Returns the day name in the string format.

For example, the new column containsMonday, Saturday, Saturday, Thursday whenDAYNAME([Date of Joining]) is applied onthe Employee table.

DAYNAME

Returns the day number of the particular month.


DAYNUMBEROFMONTH

Returns the day number in a week.

For example, Sunday =1, Monday=2.

DAYNUMBEROFWEEK

Returns the day number in a year.

For example, 1st Jan =1, 1st Feb=32, 3rdFeb=34.

DAYNUMBEROFYEAR

Returns the date of the last day in a week.


LASTDATEOFWEEK

Returns the date of the last day in a month.


LASTDATEOFMONTH

Returns the month number in a date.

For example, Jan=1, Feb=2, Mar=3

MONTHNUMBEROFYEAR

Returns the week number in a year.


WEEKNUMBEROFYEAR

2012-11-1980


8/9/2019 pa1_0_7_user_en



Returns the quarter number in a date.


QUARTERNUMBEROFDATE

Concatenates two strings.

For example, CONCAT('USA', 'Australia')returns USAAustralia.

CONCATString

Returns true - if the search string is found inthe source string.

For example, INSTRING('USA', 'US') returnstrue.

INSTRING

Returns a substring from the source string.For example, SUBSTRING('USA', 1,2) re-turns US.

SUBSTRING

Returns the maximum value in a column.MAXMath

Returns the minimum value in a column.MIN

Returns the number of values in a column.COUNT

Returns the sum of the values in a column.SUM

Returns the average of the values in a col-umn.

AVERAGE

Checks whether the condition is met, andreturns one value if 'true' and another valueif 'false'.

For example, IF([Date of Joining]>12/9/2005)THEN ('Employee joined after Sept 12, 2005')ELSE ('Employee joined on or before Sept12, 2005')

IF(condition) THEN(string expression/mathe-matical expression/conditional expression)ELSE(string expression/mathematical expres-sion/conditional expression)

Condition-al Expres-sion

Note:Mathematical expressions containing functions that return a numerical value are not supported. For

example, expression DAYNUMBEROFMONTH(CURRENTDATE())==2 is not supported becauseDAYNUMBEROFMONTH returns a numerical value.

Mathematical Operators

Use mathematical operators to create formulas containing numerical columns and/or numbers. For example, the expression [Age] + 1 adds a new column with the values 26, 31, 34, 33.

2012-11-1981


8/9/2019 pa1_0_7_user_en


8/9/2019 pa1_0_7_user_en


11.3.1 CSV Writer

Use this component to write data to flat files such as CSV, TEXT, and DAT files.

CSV Writer Properties

Select .csv or .dat or .txt file.File Name

To overwrite an existing file, select this option.Overwrite

Select a column delimiter that separates data tokens in the file.Column Separator

Select the character to be added when writing the data.Quotation Character

Select this option to use the first row as column headers.Include Column Headers

Select the text-encoding method to be used when writing thedata.

Encoding

Select the character to be used for decimal representation in digitgrouping.

Decimal Separator

Select the character to be used as the thousands separator.Grouping Separator

Enter the number format you want to apply to numerical data.Number Format

Select the date format you want to apply to dates.Date Time Format

11.3.2 JDBC Writer

Use this component to write data to relational databases such as MySQL, MS SQL Server, DB2, Oracle,SAP MaxDB, and SAP HANA.

JDBC Writer Properties

Select the database type.Database Type

Enter the location of the JDBC driver path. For example, to write tothe Oracle database, you need to specify the location of the OracleJDBC jar (C:\ojdbc6.jar)

Database Driver Path

Enter the name of the machine on which the database is installed.Machine Name

Enter the database or service port number.Port Number

Enter the name of the database.Database Name

Enter the database user name.User Name

2012-11-1983


8/9/2019 pa1_0_7_user_en


Enter the password for the database user.Password

Enter the type of the table. This property is applicable when writingto the SAP HANA database.

Table Type

Enter the table name.Table Name

Select this option to overwrite the table if it already exists.Overwrite

11.3.3 HANA Writer

Use this component to write data to SAP HANA database tables.

HANA Writer ComponentEnter the name of the schema.Schema Name

Select the table type of the table to which you want to write data.Table Type

Enter the name of the table.Table Name

Select this option to overwrite the table if it already exists.Overwrite

11.4 Saved Models

Models that you create by saving the state of algorithms are listed under the Saved Models tab. TheSAP Predictive Analysis application does not contain predefined models. Therefore, when you launchthe application for the first time, the Saved Models tab does not appear.

For information on creating a new model, see the "Creating a Model" section under Working with Models .

2012-11-1984


8/9/2019 pa1_0_7_user_en


More Information

LocationInformation Resource

http://www.sap.comSAP BusinessObjects product infor-mation

Navigate to http://help.sap.com/businessobjects and on the "SAP Busi-nessObjects Overview" side panel click All Products.

You can access the most up-to-date documentation covering all SAPBusinessObjects products and their deployment at the SAP Help Portal.You can download PDF versions or installable HTML libraries.

Certain guides are stored on the SAP Service Marketplace and are notavailable from the SAP Help Portal. These guides are listed on the HelpPortal accompanied by a link to the SAP Service Marketplace. Customerswith a maintenance agreement have an authorized user ID to accessthis site. To obtain an ID, contact your customer support representative.

SAP Help Portal

http://service.sap.com/bosap-support > Documentation

• Installation guides: https://service.sap.com/bosap-instguides• Release notes: http://service.sap.com/releasenotes

The SAP Service Marketplace stores certain installation guides, upgradeand migration guides, deployment guides, release notes and SupportedPlatforms documents. Customers with a maintenance agreement havean authorized user ID to access this site. Contact your customer supportrepresentative to obtain an ID. If you are redirected to the SAP ServiceMarketplace from the SAP Help Portal, use the menu in the navigationpane on the left to locate the category containing the documentation youwant to access.

SAP Service Marketplace

https://cw.sdn.sap.com/cw/community/docupedia

Docupedia provides additional documentation resources, a collaborativeauthoring environment, and an interactive feedback channel.

Docupedia

https://boc.sdn.sap.com/

https://www.sdn.sap.com/irj/sdn/businessobjects-sdklibraryDeveloper resources

2012-11-1985

More Information

http://www.sap.com/

http://help.sap.com/businessobjects

http://service.sap.com/bosap-support

https://service.sap.com/bosap-instguides

http://service.sap.com/releasenotes



https://www.sdn.sap.com/irj/sdn/businessobjects-sdklibrary

https://www.sdn.sap.com/irj/sdn/businessobjects-sdklibrary



http://service.sap.com/releasenotes

https://service.sap.com/bosap-instguides


http://help.sap.com/businessobjects

http://www.sap.com/

8/9/2019 pa1_0_7_user_en


LocationInformation Resource

https://www.sdn.sap.com/irj/boc/businessobjects-articles

These articles were formerly known as technical papers.

SAP BusinessObjects articles on

the SAP Community Network

https://service.sap.com/notes

These notes were formerly known as Knowledge Base articles.Notes

https://www.sdn.sap.com/irj/scn/forumsForums on the SAP CommunityNetwork

http://www.sap.com/services/education

From traditional classroom learning to targeted e-learning seminars, we

can offer a training package to suit your learning needs and preferredlearning style.

Training


The SAP Support Portal contains information about Customer Supportprograms and services. It also has links to a wide range of technical in-formation and downloads. Customers with a maintenance agreementhave an authorized user ID to access this site. To obtain an ID, contactyour customer support representative.

Online customer support

http://www.sap.com/services/bysubject/businessobjectsconsulting

Consultants can accompany you from the initial analysis stage to thedelivery of your deployment project. Expertise is available in topics suchas relational and multidimensional databases, connectivity, databasedesign tools, and customized embedding technology.

Consulting

More Information



https://www.sdn.sap.com/irj/scn/forums







https://www.sdn.sap.com/irj/scn/forums



pa1_0_7_user_en

Documents

Transcript of pa1_0_7_user_en