8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 1/86
SAP Predictive Analysis User Guide
■ SAP Predictive Analysis 1.0.7
2012-11-19
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 2/86
© 2012 SAP AG. All rights reserved.SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAPBusinessObjects Explorer, StreamWork, SAP HANA and other SAP products and services mentioned
Copyright
herein as well as their respective logos are trademarks or registered trademarks of SAP AG in
Germany and other countries.Business Objects and the Business Objects logo, BusinessObjects,Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects productsand services mentioned herein as well as their respective logos are trademarks or registeredtrademarks of Business Objects Software Ltd. Business Objects is an SAP company.Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and servicesmentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase,Inc. Sybase is an SAP company. Crossgate, m@gic EDDY, B2B 360°, B2B 360° Services areregistered trademarks of Crossgate AG in Germany and other countries. Crossgate is an SAPcompany. All other product and service names mentioned are the trademarks of their respectivecompanies. Data contained in this document serves informational purposes only. National productspecifications may vary.These materials are subject to change without notice. These materials areprovided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services
are those that are set forth in the express warranty statements accompanying such products andservices, if any. Nothing herein should be construed as constituting an additional warranty.
2012-11-19
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 3/86
Contents
About this Guide.....................................................................................................................7Chapter 1
What this Guide Contains........................................................................................................71.1
Target Audience.......................................................................................................................71.2
Release Restrictions...............................................................................................................9Chapter 2
SAP Predictive Analysis Overview........................................................................................11Chapter 3
Installing SAP Predictive Analysis........................................................................................13Chapter 4
Installation prerequisites.........................................................................................................134.1
To install SAP Predictive Analysis using the setup program....................................................134.2
To uninstall SAP Predictive Analysis ......................................................................................144.3
Important considerations for using SAP HANA.......................................................................144.4
To configure _SYS_REPO for the SAP Predictive Analysis user.............................................154.4.1
Supported OLAP measures ..................................................................................................154.4.2
Important considerations for using SAP BusinessObjects Universes......................................154.5
Open-Source R Installation and Configuration.....................................................................17Chapter 5
Installing R-2.15.1 and the Required Packages.......................................................................175.1
Configuring R.........................................................................................................................175.2
Getting Started with SAP Predictive Analysis.......................................................................19Chapter 6
Basics of SAP Predictive Analysis..........................................................................................196.1
Launching SAP Predictive Analysis........................................................................................206.2
Understanding SAP Predictive Analysis.................................................................................206.3
Designer View.......................................................................................................................216.3.1
Results View..........................................................................................................................216.3.2
Using SAP Predictive Analysis from Start to Finish................................................................226.4
Building Analyses..................................................................................................................25Chapter 7
Creating an Analysis..............................................................................................................257.1
2012-11-193
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 4/86
Acquiring Data from a Data Source........................................................................................257.1.1
Preparing Data for Analysis....................................................................................................267.1.2
Applying Algorithms...............................................................................................................277.1.3
Storing Results of the Analysis..............................................................................................297.1.4
Running the Analysis..............................................................................................................297.2
Saving the Analysis................................................................................................................307.3
Viewing Results.....................................................................................................................307.4
Analyzing Data......................................................................................................................31Chapter 8
Visualization Charts...............................................................................................................318.1
Scatter Matrix Chart..............................................................................................................318.1.1
Statistical Summary Chart......................................................................................................328.1.2
Parallel Coordinates...............................................................................................................328.1.3
Decision Tree.........................................................................................................................338.1.4Regression Chart...................................................................................................................348.1.5
Time Series Chart..................................................................................................................358.1.6
Cluster Chart.........................................................................................................................368.1.7
Working with Models............................................................................................................37Chapter 9
Creating a Model...................................................................................................................379.1
Viewing Model Information.....................................................................................................379.2
Exporting a Model as PMML..................................................................................................389.3
Deleting a Model....................................................................................................................389.4
Use Case Scenarios..............................................................................................................39Chapter 10
Sales Forecasting..................................................................................................................3910.1
Retail Store Segmentation.....................................................................................................4010.2
Component Properties..........................................................................................................43Chapter 11
Algorithms..............................................................................................................................4311.1
Regression.............................................................................................................................4311.1.1
Outliers..................................................................................................................................5211.1.2
Time Series............................................................................................................................5411.1.3
Decision Trees.......................................................................................................................5811.1.4
Neural Network......................................................................................................................6111.1.5
Clustering..............................................................................................................................6411.1.6
Association............................................................................................................................6611.1.7
Classification..........................................................................................................................6811.1.8
Data Preparation Components...............................................................................................6911.2
Formula..................................................................................................................................6911.2.1
2012-11-194
Contents
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 5/86
Sample...................................................................................................................................7411.2.2
Data Type Definition...............................................................................................................7711.2.3
Filter.......................................................................................................................................7711.2.4
Data Writers..........................................................................................................................8211.3
CSV Writer............................................................................................................................8311.3.1
JDBC Writer..........................................................................................................................8311.3.2
HANA Writer..........................................................................................................................8411.3.3
Saved Models........................................................................................................................8411.4
More Information...................................................................................................................85 Appendix A
2012-11-195
Contents
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 6/86
2012-11-196
Contents
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 7/86
About this Guide
1.1 What this Guide Contains
This guide provides:
• An overview of SAP Predictive Analysis
• Information on how to install and configure SAP Predictive Analysis
• Information on various algorithms and components available in SAP Predictive Analysis
• Information on how to create analyses and models
• Information on how to analyze data using predictive analysis visualization techniques
This guide does not cover:
• How to acquire data from various data sources
• How to perform data manipulation, data cleansing, and semantic enrichments operations in thePrepare panel
• How to share charts and datasets
Note:SAP Predictive Analysis inherits data acquisition and data manipulation functionality from SAP VisualIntelligence. Therefore, for information about workflows not covered in this guide, see the SAP Visual
Intelligence User Guide available at http://help.sap.com/vi. We recommend that you read the SAP Visual
Intelligence User Guide in combination with the SAP Predictive Analysis User Guide to understand thecomplete workflow for analyzing data using predictive analysis algorithms.
1.2 Target Audience
This guide is intended for professional data analysts, business analysts, and information designers whowant to use the SAP Predictive Analysis application to analyze and visualize data using predictivealgorithms.
Note:To use the SAP Predictive Analysis application, you need to be familiar with statistical and data miningalgorithms and have a basic understanding on how to use these algorithms.
2012-11-197
About this Guide
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 8/86
2012-11-198
About this Guide
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 9/86
Release Restrictions
The following are the known issues and limitations in this release:
• The application might crash when viewing charts with large input data.
To work around this issue, you need to either remove or modify the -Xmx parameter in the SAPPre
dictiveAnalysis.ini file, depending on your system configuration.
While working with the application, if the memory consumed by the application is not released, youmay experience a delay in opening the document. To work around this issue, relaunch the application.
• An error occurs when you try to save the document (.SViD file) from the Predict view with savedvisualizations created in the Prepare view. You may also encounter similar error when you try tosave the document (.SViD file) after using the Enrich All option in the Visualize pane of the Predictview.
To work around this issue, save the document by switching to the Prepare view.
• To enable R algorithms from within the SAP Predictive Analysis application, you need to have accessrights to update files in the SAP Predictive Analysis install location. If you do not have rights, youneed to contact IT administrator to obtain rights.
• You need to use the INSTRING function in a formula of the filter component in the following format:INSTRING(‘String’,’String’) == 'true’ / 'false'
• You can configure certain components in an analysis even though they are not connected to thereader component. However, when you try to run the analysis, an error occurs.
• The application hangs if you try to run an analysis in which the names of the selected columns inthe acquired data set contain special characters, such as ~!@#$%^&*()_+`-={}|[]\:";'<>?,./.
To work around this issue, rename the column before navigating to the Predict view.
• The application cannot render a decision tree if there are more than 32 distinct categorical valuesfor a dependent column.
• After acquiring data from the HANA Online data source, if you apply a filter in the Prepare view,create and execute an analysis in the Predict view, view the analysis results, and then try to navigateback to the Prepare view, applied filter is not retained in the Prepare view.
• While installing R, if you close the SAP Predictive Analysis application, the R installation is not
immediately stopped. To end the installation, you need to kill the corresponding powershell.exeusing Microsoft Windows Task Manager.
• If your existing R packages are corrupted, you cannot use the Install and Configure R option toinstall R packages. To use the Install and Configure R option from the application, you need tomanually remove the corrupted R packages.
• Data size limits for visualizations:
2012-11-199
Release Restrictions
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 10/86
Number of Rows SupportedChart
240 rowsScatter Matrix Chart
2500 rowsNote:The application takes some time to render thechart if the input data is more than 1000 rows.
Parallel Coordinates
3000 rows
Note:The application takes some time to render thechart if the input data is more than 1000 rows.
Time Series Chart
3000 rows
Note:The application takes some time to render thechart if the input data is more than 1000 rows.
Regression Chart
• When viewing the scatter matrix chart with large data set, the application displays the message"Loading, please wait", and the chart is not displayed.
To work around this issue, reduce the input data size, run the analysis, and view the chart.
2012-11-1910
Release Restrictions
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 11/86
SAP Predictive Analysis Overview
SAP Predictive Analysis is a statistical analysis and data mining solution that enables you to buildpredictive models to discover hidden insights and relationships in your data, from which you can makepredictions about future events.
With SAP Predictive Analysis, you can perform various analyses on the data, including time seriesforecasting, outlier detection, trend analysis, classification analysis, segmentation analysis, and affinityanalysis. This application enables you to analyze data using different visualization techniques, such asscatter matrix charts, parallel coordinates, cluster charts, and decision trees.
SAP Predictive Analysis offers a range of predictive analysis algorithms, supports the use of the Ropen-source statistical analysis language, and offers in-memory data mining capabilities for handlinglarge volume data analysis efficiently.
Note:SAP Predictive Analysis inherits data acquisition and data manipulation functionality from SAP VisualIntelligence. SAP Visual Intelligence is a data manipulation and visualization tool. Using SAP VisualIntelligence, you can connect to various data sources such as flat files, relational databases, in-memorydatabases, and SAP BusinessObjects universes, and can operate on different volumes of data, froma small matrix of data in a CSV file to a very large dataset in SAP HANA, select and clean data, andmanipulate data.
2012-11-1911
SAP Predictive Analysis Overview
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 12/86
2012-11-1912
SAP Predictive Analysis Overview
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 13/86
Installing SAP Predictive Analysis
4.1 Installation prerequisites
Before installing SAP Predictive Analysis, make sure the following requirements are met:
• You must have Microsoft Windows 7 operating system installed on your machine. SAP Predictive Analysis is supported on both 32-bit and 64-bit machines.
• If you have already installed SAP Visual Intelligence on your machine, you need to uninstall it beforeinstalling SAP Predictive Analysis.
• You must have administrator rights to install SAP Predictive Analysis on the computer.
• Sufficient disk space must be available on the following resources:
Required SpaceResource
2.5 GBDrive hosting the User application data folder
200 MBUser temporary folder (\AppData\Local\Temp)
500 MBDrive hosting the installation directory
• The following ports must be available:
Required by Port
Sybase IQ database6401
SAP Predictive Analysis installation Any port in the range 4520-4539
For a detailed list of supported environments and hardware requirements, see the Product AvailibilityMatrix at: http://service.sap.com/pam
4.2 To install SAP Predictive Analysis using the setup program
1. Run the setup.exe file.
The "User Account Control" dialog box appears with a warning message.
2. Choose Yes in the confirmation prompt.
2012-11-1913
Installing SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 14/86
3. Specify the destination folder for installing SAP Predictive Analysis.
• To accept the default installation directory, choose Next .
• To navigate to the folder where you want to install SAP Predictive Analysis, choose Browse.Select the required folder and choose Next.
The "License Agreement" page opens.
4. Review the license agreement and select I accept the License Agreement and choose Next.
5. To begin the installation, choose Next.
The installation is complete when the "Finish Installation" page opens.
6. To exit this installation, choose Finish.
4.3 To uninstall SAP Predictive Analysis
1. Choose Start > Control Panel > Programs.
2. Choose Uninstall a program.
3. Right-click SAP Predictive Analysis and choose Uninstall.
The SAP Predictive Analysis Setup wizard appears.
4. On the Confirm Uninstall page, choose Next .
5. To complete the uninstallation, choose Finish .
4.4 Important considerations for using SAP HANA
This section contains important considerations and requirements for using SAP Predictive Analysiswith the SAP HANA database.
Security requirements for publishing to SAP HANA
Before users can publish content to SAP HANA, they must be assigned specific privileges and roles.These roles and privileges are also required for retrieving data from SAP HANA. Use the SAP HANAStudio application to assign user roles and privileges. For information on administrating the SAP HANAdatabase and using SAP HANA Studio see SAP HANA Database – Administration Guide . For informationon user security see the SAP HANA Security Guide (Including SAP HANA Database Security) .
The user account used to log into the SAP HANA system from SAP Predictive Analysis must be assignedthe "MODELING" role (in SAP HANA).
Note:This action can only be performed by a user with ROLE_ADMIN privileges on the SAP HANA database.
When an SAP Predictive Analysis user logs into the SAP HANA system, the internal _SYS_REPO
account must:
2012-11-1914
Installing SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 15/86
• Be granted the SELECT SQL Privileges.
• Have the Grantable to othersoption selected in the (SAP Predictive Analysis) user's schema.
4.4.1 To configure _SYS_REPO for the SAP Predictive Analysis user
If an account for the SAP Predictive Analysis user is already defined in the SAP HANA system:
1. From the system connection in the SAP HANA Studio Navigator window, choose Catalog >
Authorization > Users.
2. Double-click the _SYS_REPO account.
3. On the SQL Privileges tab, click the + icon, and enter the name of the user's schema, choose OK.
4. Choose SELECT and the corresponding Yes under Grantable to others.
5. Choose Deploy or Save.
Note:Users can also open an SQL editor in SAP HANA Studio and run the following SQL statement:
GRANT SELECT ON SCHEMA <user_account_name> TO _SYS_REPO WITH GRANT OPTION
4.4.2 Supported OLAP measures
SAP HANA supports only the following measures of aggregation in OLAP data sources
• SUM
• MIN
• MAX
• COUNT
If your dataset contains an aggregation on a measure that is not listed above, the aggregation will beignored by SAP HANA during publication and it will not be part of the final published artifact.
4.5 Important considerations for using SAP BusinessObjects Universes
• To acquire data from universes that exist on the BI 4.0 platform, ensure that the Web IntelligenceServer running.
• You also need to ensure that your Business Intelligence platform is at BI 4.0 SP2 patch level 14 or above.
2012-11-1915
Installing SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 16/86
Note:You can also acquire data from universes that exist on BI 4.0 SP3 and BI 4.0 SP4 platforms.
2012-11-1916
Installing SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 17/86
Open-Source R Installation and Configuration
5.1 Installing R-2.15.1 and the Required Packages
To use open-source R algorithms in your analysis, you need to install the R environment and configureit with the SAP Predictive Analysis application.
SAP Predictive Analysis provides an option to install and configure R 2.15.1 and the required packagesfrom within the application. Ensure that you are connected to the internet while installing R.
To install the R environment and the required packages, perform the following steps:
1. Launch the SAP predictive analysis application.
2. From the File menu, choose Install and Configure R.
3. Select Install R.
4. Read the open-source R license agreement, important instructions, and select I agree to install R
using the script.
5. Select OK.
Note:
If you have already installed R 2.15.x, you can use this procedure to install the required R packages.
5.2 Configuring R
After you have installed R, you need to configure the R environment to enable R algorithms in theapplication. If you have already installed R-2.11.1 or R-2.15.1 and the required packages, you can skipthe R installation step and directly configure R.
Note:
Before configuring R-2.11.1, you need to set certain environment variables. For example, if you haveinstalled R at C:\Program Files\R\R-2.11.1, then you need to set the environment variables asfollows:
• R_HOME= C:\Program Files\R\R-2.11.1
• R_LIBS = C:\Program Files\R\R-2.11.1\library
• Path = existing path; C:\Program Files\R\R-2.11.1\library\rJava\jri;
C:\Program Files\R\R-2.11.1\bin
2012-11-1917
Open-Source R Installation and Configuration
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 18/86
To configure R, perform the following steps:
1. Launch the SAP predictive analysis application.
2. From the File menu, choose Install and Configure R.
3. On the Configuration tab, select Enable Open Source R Algorithms.4. Choose Browse to select the R installation folder.
5. Choose OK.
The "User Account Control" dialog box appears with a warning message.
6. Choose Yes in the confirmation prompt.
2012-11-1918
Open-Source R Installation and Configuration
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 19/86
Getting Started with SAP Predictive Analysis
6.1 Basics of SAP Predictive Analysis
Component
A component is the basic processing unit of SAP Predictive Analysis. Each component contains inputand/or output anchors (connection points). These anchors are used to connect components through
connectors. When you connect components together, data is transmitted from predecessor componentsto their successor components.
SAP Predictive Analysis consists of the following components:
• Data preparation
• Algorithms
• Data writers
You can access components from the Designer view of the Predict panel. After you have addedcomponents to the analysis editor, the status icon of a component allows you to identify its state.
The following are the states of a component:
• (Not Configured): This state is displayed when you drag a component onto the analysis editor.It indicates that the component needs to be configured before running the analysis.
• (Configured): This state is displayed once all the necessary properties are configured for thecomponent.
• (Success): This state is displayed after the successful execution of the analysis.• (Failure): This state is displayed if this component causes the execution of the analysis to fail.
Analysis
An analysis is a series of different components connected together in a particular sequence withconnectors, which define the direction of the data flow.
2012-11-1919
Getting Started with SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 20/86
Model
A model is a reusable component created by training an algorithm using historical data.
In-Database (In-DB)
In-database (in-DB) is an analysis execution mode in which data processing is performed within thedatabase using data mining capabilities. In this mode, the data is never taken out of the database for processing and hence the processing speed is very high. This mode can be used to process large datasets. SAP HANA supports in-DB data mining through R integration and Predictive Analysis Library(PAL).
In-Process (In-Proc)
In-Process is an analysis execution mode in which the data processing is performed by taking data outof the database into the predictive analysis process space. This type of analysis is also referred to asOut-DB analysis.
6.2 Launching SAP Predictive Analysis
To launch SAP Predictive Analysis, choose Start > All Programs > SAP Business Intelligence >SAP Predictive Analysis > SAP Predictive Analysis.
6.3 Understanding SAP Predictive Analysis
When you launch SAP Predictive Analysis, the home page appears. The home page contains informationthat helps you get started with SAP Predictive Analysis.
To start analyzing data using SAP Predictive Analysis, you need to first connect to the data source andacquire data for analysis. After acquiring data, you can perform the following operations on data:
• Prepare data for analysis by applying data manipulation and data cleansing functions
• Analyze data by applying data mining and statistical analysis algorithms
• Share datasets and charts with external collaborators
Note:
This guide describes how to analyze data by applying data mining and statistical analysis algorithms.For information on how to acquire data, prepare data, and share datasets, see the SAP Visual Intelligence
User Guide available at http://help.sap.com/vi.
Once you have acquired data from the data source, you need to switch to the Predict panel to analyzedata.
2012-11-1920
Getting Started with SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 21/86
6.3.1 Designer View
The Designer view enables you to design and run analyses, and to create predictive models.
6.3.2 Results View
The Results view enables you to understand data and analysis results by using various visualizationtechniques and intuitive charts.
2012-11-1921
Getting Started with SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 22/86
6.4 Using SAP Predictive Analysis from Start to Finish
The following is an overview of the process you can follow to build a chart based on a dataset. Theprocess is not a linear one, and you can move from one step back to a preceding step to fine-tune your chart or data.
DescriptionSteps to work with your da- ta
If your data source is:• RDBMS: Enter your credentials, connect to the database server,
browse and select a data source; for example, if you are connectingto SAP HANA, you select a view and cube to build your chart.
• Flat file: Choose the columns to be acquired, trimmed, or shownand hidden.
• Universe: Enter your universe credentials, connect to the CentralManagement Server repository, and select a universe to build your chart.
Connect to your data source.
Note:For information on how toconnect to your data source,see the Connecting to your
data source section of theSAP Visual Intelligence User
Guide .
You can view the data acquired as columns or as facets. You can or-ganize the data display to make chart building easier by doing thefollowing:• Create filters and hide unneeded columns
• Create measures, time hierarchies, and geography hierarchies• Clean and organize the data in columns using a range of manipu-
lation tools
• Create columns with formulas using a wide selection of availablefunctions
View and organize thecolumns and attributes.
Note:
For information on how toview columns and attributes,see the Preparing your data
section of the SAP Visual
Intelligence User Guide .
2012-11-1922
Getting Started with SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 23/86
DescriptionSteps to work with your da- ta
Once you have acquired the relevant data in the Prepare panel, switch
to the Predict panel and create an analysis to find patterns in the dataand predict the future outcomes.
In the Predict panel, you can do the following:
• Create an analysis
• Build predictive models
• View analysis results
• View model visualizations
• Build charts
Note:
For information on building charts, see the Visualizing your data
section of the SAP Visual Intelligence User Guide .
Analyze your data usingpredictive analysis algo-rithms.
Note:This guide provides informa-tion on how to analyze datausing predictive analysis al-gorithms.
Name and save the analysis that includes your charts. Analysis issaved in a document with the file format .SViD in the application folder under Documents in your profile path.
Save your analysis
2012-11-1923
Getting Started with SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 24/86
2012-11-1924
Getting Started with SAP Predictive Analysis
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 25/86
Building Analyses
7.1 Creating an Analysis
You can use SAP Predictive Analysis to perform data mining and statistical analysis by running datathrough a series of components. The series of components must be connected to each other withconnectors, which define the direction of the data flow. This process is referred to as analysis. Using
analysis, you can read data from a data source, analyze data by applying data manipulation functionsand data mining and statistical algorithms, and store the results of the analysis.
To create an analysis, perform the following steps:
1. Acquire data from a data source
2. (Optional) Prepare the data for analysis (for example, by filtering the data)
3. Apply algorithms
4. (Optional) Store the results of the analysis for further analysis
Related Topics
• Acquiring Data from a Data Source
• Preparing Data for Analysis• Applying Algorithms
• Storing Results of the Analysis
7.1.1 Acquiring Data from a Data Source
1. On the Home page, choose the New Document button in the top left corner.
2. Connect to or browse to your data source.
You can acquire data from the following data sources:
DescriptionData Source
You can acquire data from a comma-separatedvalue data file and perform in-process (in-proc)analysis using SAP and R algorithms.
CSV file
2012-11-1925
Building Analyses
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 26/86
DescriptionData Source
You can create your own data provider bymanually entering the SQL for a target data
source and perform in-process (in-proc) analysisusing SAP and R algorithms.
Free hand SQL
You can acquire data from SAP HANA tables,views, and analysis views and perform in-pro-cess (in-proc) analysis using SAP and R algo-rithms.
SAP HANA Offline
You can acquire data from SAP HANA tables,views, and analysis views and perform in-database (in-db) analysis using HANA PAL al-gorithms.
SAP HANA Online
You can acquire data from a Microsoft Excelspreadsheet and perform in-process (in-proc)analysis using SAP and R algorithms.
MS Excel
You can acquire data from SAP BusinessOb- jects universes that exists on the XI 3.x platformand perform in-process (in-proc) analysis usingSAP and R algorithms.
Universe 3.x
You can acquire data from SAP BusinessOb- jects universes that exists on the BI 4.x platformand perform in-process (in-proc) analysis usingSAP and R algorithms.
Universe 4.x
3. Choose Acquire or Select as required.
The columns appear in the Data pane, the attributes and measures to the left in the Semantic pane.You are now ready to start building your analysis. In the Predict panel, the configured data reader component is added to the analysis editor. You can run the analysis to see the results of the data reader component.
Note:For information on how to connect to a specific data source, see the SAP Visual Intelligence User Guide
available at http://help.sap.com/vi.
7.1.2 Preparing Data for Analysis
This is an optional step.
2012-11-1926
Building Analyses
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 27/86
In many cases, the raw data from the data source may not be suitable for analysis. For accurate results,you may need to prepare and process the data before analysis. You can find data manipulation functionsin the Prepare panel and data preparation functions in the Predict panel.
Data preparation involves checking data for accuracy and missing fields, filtering data based on rangevalues, sampling the data to investigate a subset of data, and manipulating data. You can process datausing data preparation components.
1. In the Predict panel, double-click the required data preparation component from the Data Preparation
tab.
The data preparation component is added to the analysis editor and an automatic connection iscreated to the data reader component.
2. Right-click the data preparation component and choose Configure Properties.
3. In the component properties dialog box, enter the necessary details for the data preparationcomponent properties.
4. Choose Save and Close.
5. To view the results of the data reader component and the data preparation component, choose .
Related Topics
• Data Preparation Components
7.1.3 Applying Algorithms
Once you have the relevant data for analysis, you need to apply appropriate algorithms to determinepatterns in the data.
Determining an appropriate algorithm to use for a specific purpose is a challenging task. You can usea combination of a number of algorithms to analyze data. For example, you can first use time seriesalgorithms to smooth data and then use regression algorithms to find trends.
The following table provides information on which algorithm to choose for specific purposes:
2012-11-1927
Building Analyses
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 28/86
Time Series Algorithms
• Triple Exponential Smoothing
• R-Single Exponential Smoothing• R-Double Exponential Smoothing
• R-Triple Exponential Smoothing
Performing time-based predictions
Regression Algorithms
• Linear Regression
• Exponential Regression
• Geometric Regression
• Logarithmic Regression
• HANA Multiple Linear Regression
• R-Linear Regression
• R-Exponential Regression• R-Geometric Regression
• R-Logarithmic Regression
• R-Multiple Linear Regression
Predicting continuous variables based on other variables in the dataset
Association Algorithms
• HANA Apriori
• R-Apriori
Finding frequent itemset patterns in large transac-tional datasets to generate association rules
Clustering Algorithms
• HANA K-Means
• K-Means
Clustering observations into groups of similar itemsets
Decision Trees
• HANA C 4.5
• R-CNR Tree
Classifying and predicting one or more discretevariables based on other variables in the dataset
Outlier Detection Algorithms
• Inter Quartile Range
• Nearest Neighbour Outlier
Detecting outlying values in the dataset
Neural Network Algorithms
• R-NNet Neural Network
• R-MONMLP Neural Network
Forecasting, classification, and statistical patternrecognition
1. In the Predict panel, double-click the required algorithm component from the Algorithms tab.
2012-11-1928
Building Analyses
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 29/86
The algorithm component is added to the analysis editor and is connected to the previous componentin the analysis.
2. Right-click the algorithm component and choose Configure Properties.
3. In the component properties dialog box, enter the necessary details for the algorithm componentproperties.
4. Choose Save and Close.
5. To view the results of the data reader component, data preparation component, and algorithm,
choose .
Related Topics
• Algorithms
7.1.4 Storing Results of the Analysis
This is an optional step.
You can store the results of the analysis in flat files or databases for further analysis using data writer components.
1. In the Predict panel, double-click the required data writer component from the Data Writers tab.
The data writer component is added to the analysis editor and is connected to the previous componentin the analysis.
2. Right-click the data writer component and choose Configure Properties.
3. In the component properties dialog box, enter the necessary details for the data writer componentproperties.
4. Choose Save and Close.
5. To view the results of the data reader component, data preparation component, algorithm, and data
writer component, choose .
Related Topics
• Data Writers
7.2 Running the Analysis
2012-11-1929
Building Analyses
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 30/86
To run the analysis, choose in the analysis editor toolbar or right-click the last component in theanalysis, and choose Run Analysis.
If your analysis is very large and complex, you can run the analysis, component by component and
analyze the data. To run part of the analysis, choose in the analysis editor toolbar or right-click thecomponent up to which you want to run, and choose Run Till Here.
7.3 Saving the Analysis
After creating an analysis, you can save it for reuse in the future. In SAP Predictive Analysis, you needto save the document to save the corresponding analysis. The document is saved in the .SViD fileformat. The saved document contains the dataset (data reader component) you acquired from the data
source and the analysis you created.To save an analysis in a document, perform the following steps:
1. Choose File > Save.
2. Enter a name for the document.
3. Choose Save.
If you create multiple analyses using the same dataset, all of the analyses are saved in the samedocument. You can access all of the analyses in a document through the Change drop-down list.
To add a new analysis to the document, choose in the analysis toolbar. To rename the analysis,
choose and enter a new name. To delete an existing analysis from the document, choose .
Note:Results from the execution of components are not saved with analyses. To view component results,you need to execute the analysis again.
7.4 Viewing Results
To view the results of components in an analysis, after running the analysis, right-click the component,and choose View Results. The Results view is displayed.
2012-11-1930
Building Analyses
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 31/86
Analyzing Data
After the successful execution of the analysis, the result of each component in the analysis is representedusing different visualization charts.
To analyze data, perform the following steps:
1. After running an analysis, switch to the Results view by choosing the Results button in the toolbar.
2. From the Component Selector pane, choose the required component in the analysis to view itsvisualization.
By default, the result of the component is displayed in the Grid pane. You can switch to the Charts
pane to view the result of the component in the corresponding visualization chart. In addition, you canalso build your own chart in the Visualize pane.
The following table summarizes components and their supported visualization charts.
Visualization ChartsComponents
Scatter Matrix Chart, Statistical Summary Chart, and Parallel Coor-dinates
Data Readers and Data Prepara-tion
Cluster Graph and Algorithm SummaryClustering Algorithms
Decision Tree, Algorithm SummaryDecision Trees
Time Series Graph, Algorithm SummaryTime Series Algorithms
Regression Graph, Algorithm SummaryRegression Algorithms
8.1 Visualization Charts
8.1.1 Scatter Matrix Chart
Scatter matrix charts are matrices of charts (n*n charts, where n is the number of selected attributes)used to compare data across different dimensions. By default, a maximum of four continuous attributesare selected for analysis, starting from the first attribute from the source data, and a 4*4 matrix of charts
2012-11-1931
Analyzing Data
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 32/86
are plotted. However, you can manually select the required attributes from the Settings option andrefresh the visualization by choosing Apply.
Note:
You can select a maximum of four continuous attributes in the Settings option.
8.1.2 Statistical Summary Chart
Statistical Summary provides summary information for continuous attributes in the data source. Thesummary information includes count, minimum value, maximum value, variance, standard deviation,sum, average, range, and number of records. A histogram chart is plotted for each attribute.
8.1.3 Parallel Coordinates
2012-11-1932
Analyzing Data
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 33/86
Parallel coordinates is a visualization technique used to visualize multi-dimensional data and multivariatepatterns in the data for analysis.
In this chart, by default, the first five attributes are represented as vertically-spaced parallel axes. To
choose the subset of attributes to be viewed in the chart, use the Settings option. Each axis is labeledwith the attribute name, and minimum and maximum values for attributes. Each observation isrepresented as a series of connected points along the parallel axes. You can select the color by optionto filter the data based on the categorical value.
Note:You can select a maximum of seven continuous attributes in the Settings option.
8.1.4 Decision Tree
A decision tree is a visualization technique that enables you to classify observations into groups andpredict future events based on the set of decision rules.
This presentation is used for decision tree analysis. In this technique, a binary decision tree is built bysplitting observations into smaller sub-groups until the stopping criterion is met. The leaf node indicatesclassified data. You can enlarge the decision tree by choosing the zoom-in button.
Note:
• The application cannot render a decision tree if there are more than 32 categorical values for adependent column.
• The look and feel of the decision tree differs based on the algorithm vendor. For example, the decisiontree for the R-CNR Tree algorithm is different from the decision tree for the HANA C4.5 algorithm.
2012-11-1933
Analyzing Data
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 34/86
Each node in the decision tree represents the classification of data at that level. You can view node
contents by choosing on each node.
8.1.5 Regression Chart
A regression chart is used to visualize the correlation between the dependent and independent variables.In trend mode, you can analyze the performance of the algorithm by comparing the actual dependentvariables with predicted values, where dependent variables are represented as a bar graph and predictedvalues are represented as a line graph. In fill mode, the algorithm fills the missing values and displaysthe output as a line graph.
2012-11-1934
Analyzing Data
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 35/86
If the dataset is very large, the graph may be unclear. For better visibility of data, use the Range selector located at the bottom of the graph to select a specific data range from the large dataset. The data inthe selected area is displayed in the visualization editor.
8.1.6 Time Series Chart
A time series chart enables you to visualize time series data in comparison with the fitted or predictedvalues from the algorithm. You can use this chart to view the data forecasted over a specified period.In trend mode, a dependent variable is represented as a bar graph and trend values are representedas a line graph. In predict mode, a dependent variable is represented as a bar graph and predictedvalues are represented as a line graph.
If the dataset is very large, the graph may be unclear. For better visibility of data, use the Range selector located at the bottom of the graph to select a specific data range from the large dataset. The data inthe selected area is displayed in the visualization editor.
2012-11-1935
Analyzing Data
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 36/86
8.1.7 Cluster Chart
A cluster graph is a visualization technique that uses different charts to represent cluster informationsuch as cluster size, cluster density and distance, cluster variable comparison, and cluster comparison.
Note:If you use the HANA K-Means algorithm to cluster observations, then only cluster size and cluster variable comparison information are represented as charts.
Cluster Size
Cluster size is the number of elements in each cluster and is represented by a horizontal bar chart.
However, you can also visualize the cluster size in a pie chart or a vertical bar chart.
Cluster Density and Distance
The distance between clusters and density of each cluster is represented by a network chart. Eachnode in the network represents a cluster and its size. The color of the node represents density. You
can enlarge the network chart by choosing .
Cluster Variable Comparison
The comparison of the total distribution of all clusters against the distribution of each cluster is representedby a histogram. You can select the required attribute of the cluster from the variable drop-down list andchange the cluster using the slider.
Cluster Comparison
The R-K Means algorithm computes center points for each input attribute in each cluster. The comparisonof each center point and cluster is represented by the radar chart. You can select the Normalize Resultoption to view the chart with the normalized data. In the normalized mode, the data will be representedin the range of 0 to 1.
2012-11-1936
Analyzing Data
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 37/86
Working with Models
A model is a reusable component created by training an algorithm using historical data and saving theinstance.
Typically, you create models for the following reasons:
• To share computed business rules that can be applied to similar data
• To quickly analyze results without the historical data by using the trained instance of the algorithm
9.1 Creating a Model
To create a model, you need to save the state of the algorithm.
1. Acquire data from the required data source.
The data source component is added to the analysis editor in the Predict panel.
2. In the Predict panel, double-click the required algorithm component.
3. Right-click the algorithm component and choose Configure Properties.
4. Configure the algorithm properties in the dialog box.
a. Enter the necessary values for the algorithm properties.
b. Under Model Information, choose Save the Model.
c. Enter a model name and description.
d. If you want to overwrite the existing model with a new model, select Overwrite, if exists.
e. Choose Save and Close.
5. Choose .
The model is created and appears on the Saved Models tab. You can use this model just like any other component for creating an analysis.
Note:Independent column names used while scoring the model should be the same as independent columnnames used while creating the model.
9.2 Viewing Model Information
2012-11-1937
Working with Models
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 38/86
Model information includes:
• Column details such as which columns were used while generating the model
• Summary of the algorithm
This information is helpful for data analysts to understand the structure of the model.
To view model information, perform the following steps:
1. In the Predict panel, from the Saved Models tab, double-click the required model.
The Saved Models tab appears only if the models are already saved in the repository.
2. Right-click the model and choose View Model Information.
The corresponding visualization for the algorithm selected while generating the model is displayed.
9.3 Exporting a Model as PMML
You can export the model information into a local file in industry-standard Predictive Modeling MarkupLanguage (PMML) format and share the model with other PMML compliant applications to performanalysis on similar data.
To export a model in PMML format, perform the following steps:
1. Create a model.
2. In the Predict panel, from the Saved Models tab, double-click the required model.
3. Right-click the model and choose Export As PMML.
4. Enter a name for the file.5. Select the file type, either PMML or XML, as required.
6. Choose Save.
9.4 Deleting a Model
We recommend that you use this option with caution, since deleting a model might make the analysisthat contains the model's reference unusable.
To delete a model, perform the following steps:
1. In the Predict panel, choose the Saved Models tab.
2. Hover on the required model and choose the Delete icon.
2012-11-1938
Working with Models
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 39/86
Use Case Scenarios
This section provides you use case scenarios that describe how you can use SAP Predictive Analysisto analyze data and forecast future events.
10.1 Sales Forecasting
Scenario: The regional manager of an airline company wants to develop strategies to increase businessand fine-tune operations. The airline passengers' data such as flight date and number of passengerstraveled, is stored in a CSV file. The manager would like to analyze the trend in business since 2000and wants to forecast the number of passengers flying in the next year (for example, 2012).
This example assumes that the manager has some basic knowledge in statistical analysis and datamining techniques.
Using SAP Predictive Analysis, the manager creates a forecasting analysis. Since the airline passenger data is seasonal in nature, he selects the Triple Exponential Smoothing algorithm for forecasting.
To create an analysis for forecasting airline passengers, proceed as follows:
1. Launch the SAP predictive analysis application.
2. From the toolbar, choose New Document.
3. Choose CSV.
4. Choose Browse and select the Airline Passenger.csv file.
5. Choose Acquire.
6. Switch to the Predict panel.
7. From the Algorithms tab, double-click the Triple Exponential Smoothing algorithm.
The algorithm component is automatically connected to the data reader component.
8. Right-click the Triple Exponential Smoothing algorithm and choose Configure Properties.
9. In the Triple Exponential Smoothing properties dialog box, provide the necessary details:a. Select Forecast as the output mode, as you want to forecast the data.
b. Select Airline Passenger column as the dependent column. The algorithm forecasts the databased on the Airline Passenger column.
c. In the Missing Values field, select Remove.
d. In the Period field, select Month(12).
e. Enter 2000 as the start year.
f. Enter 1 as a start period. As the period is Month(12), 1 implies first month of the year (January).
2012-11-1939
Use Case Scenarios
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 40/86
g. Enter 12 for the number of periods to predict.
h. Retain the default values for the advanced properties.
i. Choose Save and Close.
10. From the Data Writers tab, double-click the CSV Writer component.11. Right-click the CSV Writer component and choose Configure Properties.
12. In the CSV Writer properties dialog box, select a CSV file to store the result.
13. Choose Save and Close.
14. Choose to run the analysis.
The fitted and forecast results are stored in the CSV file.
15. Switch to the analysis visualization view.
16. In the Components Selector pane, select Triple Exponential Smoothing.
By default, the results of the component are displayed in the Grid pane.
17. To view the visualization chart, switch to the Charts pane.
18. From the File menu, choose Save.
19. Enter a name for the document.
20. Choose Save.
10.2 Retail Store Segmentation
Scenario: The country manager of a retail chain (which has 150 stores) is finalizing plans for three salespromotion strategies. Data pertaining to stores such as store location, sales turnover, store size, staff,and profit margin is stored in a CSV file. The manager wants to segment 150 stores into three differentgroups based on sales turnover, profit margin, store size, and staff size so that specific strategies canbe applied to each store segment.
This example assumes that the country manager has some basic knowledge in statistical analysis anddata mining techniques.
Using SAP Predictive Analysis, he builds a segmentation analysis by using the R-K-Means algorithm.
To build an analysis for segmentation analysis, proceed as follows:
1. Launch the SAP predictive analysis application.
2. From the toolbar, choose New Document.
3. Choose CSV.4. Choose Browse and select the Retail Stores.csv file.
5. Choose Acquire.
6. Switch to the Predict panel.
7. From the Algorithms tab, double-click the R-K-Means algorithm.
The algorithm component is automatically connected to the data reader component.
8. Right-click the R-K-Means algorithm and choose Configure Properties.
2012-11-1940
Use Case Scenarios
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 41/86
9. In the R-K-Means properties dialog box, provide the necessary details:
a. Select the columns to be used for cluster analysis.
b. In the Number of Clusters field, enter 3.
c. Retain the default values for the advanced properties.
d. Choose Save and Close.
10. From the Data Writers tab, double-click the CSV Writer component.
11. Right-click the CSV Writer component and choose Configure Properties.
12. In the CSV Writer properties dialog box, select a CSV file to store the result.
13. Choose Save and Close.
14. Choose to run the analysis.
The fitted and forecast results are stored in the CSV file.
15. Switch to the Results view.
16. In the Components Selector pane, select R-K-Means.
By default, the results of the component are displayed in the Grid pane.
17. To view the visualization chart, switch to the Charts pane.
18. From the File menu, select Save.
19. Enter a name for the document.
20. Choose Save.
2012-11-1941
Use Case Scenarios
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 42/86
2012-11-1942
Use Case Scenarios
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 43/86
Component Properties
11.1 Algorithms
Use algorithms to perform data mining and statistical analysis on your data. For example, to determinetrends and patterns in data.
SAP Predictive Analysis provides built-in algorithms such as regressions, time series, and outliers.However, the application also supports decision trees, k-means, neural network, time series, andregression algorithms from the open-source R library. You can also perform in-database analysis usingPredictive Analysis Library (PAL) algorithms from SAP HANA.
11.1.1 Regression
11.1.1.1 Exponential Regression
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using an exponential function withthe least square methodology.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
Exponential Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
2012-11-1943
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 44/86
Select the input source column with which you want to perform regression.Independent Column
Select the target column on which regression needs to be performed.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent column.
• Keep: Retains missing values.
• Stop: Algorithm stops execution- if a value is missing in the independentcolumn or the dependent column.
Missing Values
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predicted
values.
Predicted Column Name
11.1.1.2 Geometric Regression
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a geometric function with theleast square methodology.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
Geometric Regression Properties
Select the mode in which you want to display output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
Select the input source column with which you want to perform regression.Independent ColumnSelect the target column on which regression needs to be performed.Dependent Column
2012-11-1944
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 45/86
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in the
independent or dependent columns.• Keep: Retains missing values.
• Stop: Algorithm stops execution-if a value is missing in the independentcolumn or the dependent column.
Missing Values
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and description for the model.
Save the Model
Enter a name for the newly created column that contains predicted values.Predicted Column Name
11.1.1.3 HANA Multiple Linear Regression
Use this algorithm to find the linear relationship between a dependent variable and one or moreindependent variables.
HANA Multiple Linear Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds anextra column in the output containing the predicted values.
Output Mode
Select the input source columns with which you want to performregression.
Independent Columns
Select the target column on which you want to perform regression.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.• Stop: Algorithm stops execution-if a value is missing in the independent
column or the dependent column.
Missing Values
Enter the number of threads that can be used for execution.Number of Threads
Iff you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
2012-11-1945
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 46/86
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
11.1.1.4 Linear Regression
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable with the least square methodology.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
Linear Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
Select the input source column with which you want to perform regression.Independent Column
Select the target column on which you want to perform regression.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution-if a value is missing in the independentcolumn or the dependent column.
Missing Values
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predicted
values.
Predicted Column Name
11.1.1.5 Logarithmic Regression
2012-11-1946
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 47/86
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a logarithmic function with theleast square methodology.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
Logarithmic Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
Select the input source column with which you want to perform regression.Independent ColumnSelect the target column on which you want to perform regression.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution-if a value is missing in the independentcolumn or the dependent column.
Missing Values
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.Save the Model
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
11.1.1.6 R-Exponential Regression
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using an exponential function fromthe R open-source library.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
2012-11-1947
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 48/86
R-Exponential Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
Select the input source column with which you want to perform regression.Independent Column
Select the target column on which you want to perform regression.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution-if a value is missing in the independentcolumn or the dependent column.
Missing Values
A Boolean value- if set to true, the aliased coefficients are ignored in thecoefficient covariance matrix. If set to false, a model with aliased coefficientsproduces an error.
A model with aliased coefficients signifies that the square matrix x*x issingular.
Allow Singular Fit
Select the list of contrasts to be used for factors appearing as variables in
the model.
Contrasts
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and description for the model.
Save the Model
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column
Name
11.1.1.7 R-Geometric Regression
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a geometric function from theR open-source library.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
2012-11-1948
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 49/86
R-Geometric Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
Select the input source column with which you want to perform regression.Independent Column
Select the target column on which you want to perform regression.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution- if a value is missing in the independentcolumn or the dependent column.
Missing Values
A Boolean value - if set to true, the aliased coefficients are ignored in thecoefficient covariance matrix. If set to false, a model with aliased coefficientsproduces an error.
A model with aliased coefficients signifies that the square matrix x*x issingular.
Allow Singular Fit
Select the list of contrasts to be used for factors appearing as variables in
the model.
Contrasts
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column
Name
11.1.1.8 R-Linear Regression
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable by using the R open-source library.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
2012-11-1949
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 50/86
R-Linear Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
Select the input source column with which you want to perform regression.Independent Column
Select the target column on which you want to perform regression.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution - if a value is missing in the independentcolumn or the dependent column.
Missing Values
A Boolean value - if set to true, the aliased coefficients are ignored in thecoefficient covariance matrix. If set to false, a model with aliased coefficientsproduces an error.
A model with aliased coefficients signifies that the square matrix x*x issingular.
Allow Singular Fit
Select the list of contrasts to be used for factors appearing as variables in
the model.
Contrasts
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column
Name
11.1.1.9 R-Logarithmic Regression
Use this algorithm to find trends in data. This algorithm performs univariate regression analysis. Itdetermines how an individual variable influences another variable using a logarithmic function from theR open-source library.
Note:The data type of columns used during model scoring should be same as the data type of columns usedwhile building the model.
2012-11-1950
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 51/86
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 52/86
R-Multiple Linear Regression Properties
Select the mode in which you want to display the output data.
Possible values:
• Fill: Fills missing values in the target column.
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
Output Mode
Select the input source columns with which you want to perform regression.Independent Columns
Select the target column on which regression needs to be performed.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution - if a value is missing in theindependent column or the dependent column.
Missing Values
Enter the confidence level of the algorithm (the accuracy of predictions).Confidence Level
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
11.1.2 Outliers
11.1.2.1 Inter Quartile Range
Use this algorithm to find outlying values based on the statistical distribution between the first and thirdquartiles.
Note:The input data for the IQR algorithm must be at least 4 rows.
2012-11-1952
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 53/86
Inter Quartile Range Properties
Select the mode in which you want to display the output data.
Possible values:
• Show Outliers: Adds a Boolean column to the input data specifying if thecorresponding value is an outlier.
• Remove Outliers: Removes outlying values from the input data.
Output Mode
Select the input source column.Independent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values in theindependent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution- if a value is missing in the independentcolumn or the dependent column.
Missing Values
Enter the deviation allowed for values from the inter quartile range.Fence Coefficient
11.1.2.2 Nearest Neighbor Outlier
Use this algorithm to find outlying values based on the number of neighbors (N) and the average distanceof values compared to their nearest N neighbors.
Nearest Neighbour Outlier Properties
Select the mode in which you want to display the output data.
Possible values:
• Show Outliers: Adds a Boolean column to the input data specifyingif the corresponding value is an outlier.
• Remove Outliers: Removes outlying values from the input data.
Output Mode
Select the input source column.Independent Column
Select the method for handling missing values.Possible values:
• Remove: Algorithm skips the records containing missing values inthe independent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution - if a value is missing in theindependent column or the dependent column.
Missing Values
2012-11-1953
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 54/86
Enter the deviation allowed for values from the inter quartile range.Neighborhood Count
Enter the number of outliers to be removed.Number of Outliers
Enter a name for the new column that contains the predicted values.Predicted Column Name
11.1.3 Time Series
11.1.3.1 Triple Exponential Smoothing
Use this algorithm to smooth the source data and find seasonal trends in data.
Triple Exponential Smoothing Properties
Select the mode in which you want to display the output.
• Trend: Displays source data along with predicted values for the givendataset.
• Forecast: Displays forecasted values for the given time period.
Output Mode
Select the input column to be forecasted.Dependent Column
Select this option to specify whether to use the date column.Consider Date Column
Enter the name of the column that contains date values.Date Column
Select the method to handle missing entries.
• Remove: Algorithm skips the records containing missing values inthe independent column or the dependent column.
• Stop: Algorithm stops execution - if a value is missing in theindependent column or the dependent column.
Missing Values
Select the period for forecasting.Period
Select the periods for forecasting. This option is only enabled if youselect "Custom" for "Period".
Periods Per Year
Enter the year from which the observations are to be considered. For example, 2009, 1987, 2019.
Start Year
Enter the period from which the observations are to be considered.Start Period
Enter the number of periods to predict.Periods to Predict
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
2012-11-1954
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 55/86
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
Enter a name for the newly created column that contains year values. Year Values
Enter a name for the newly created column that contains quarter values.Quarter Values
Enter a name for the newly created column that contains month values.Month Values
Enter a name for the newly created column that contains period values.Period Values
Enter a smoothing constant for smoothing observations (baseparameters). Range: 0-1.
Alpha
Enter a smoothing constant for finding trend parameters. Range: 0-1.Beta
Enter a smoothing constant for finding seasonal trend parameters.Range: 0-1.
Gamma
11.1.3.2 R-Double Exponential Smoothing
Use this algorithm to smooth the source data and find trends in data.
R-Double Exponential Smoothing Properties
Select the mode in which you want to display the output.
• Trend: Displays source data along with predicted values for thegiven dataset.
• Forecast: Displays forecasted values for the given time period.
Output Mode
Select the input column to be forecasted.Dependent Column
Select the period for forecasting.Period
Select the periods for forecasting. This option is only enabled if youselect "Custom" for "Period".
Periods Per Year
Enter the year from which the observations are to be considered. For example, 2009, 1987, 2019.
Start Year
Enter the period from which the observations are to be considered.Start Period
Enter the number of periods to predict.Periods to Predict
If you want to save the state of the algorithm, select this option. Tosave, you need to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
Enter a name for the newly created column that contains year values. Year Values
2012-11-1955
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 56/86
Enter a name for the newly created column that contains quarter values.Quarter Values
Enter a name for the newly created column that contains month values.Month Values
Enter a name for the newly created column that contains period values.Period Values
Enter a smoothing constant for smoothing observations (baseparameters). Range: 0-1.
Alpha
Enter a smoothing constant for finding trend parameters. Range: 0-1.Beta
Enter the confidence level of the algorithm (the accuracy of predictions).Confidence Level
Enter the number of periodic observations required to start thecalculation.
No. Periodic Observations
Enter the start value for level (a[0]) (l.start). For example: 0.4Level
Enter the start value for finding trend parameters (b[0]) (b.start). For example: 0.4
Trend
Enter the starting values for alpha, beta, and gamma required for theoptimizer. For example: 0.3, 0.1, 0.1
Optimizer Inputs
11.1.3.3 R-Single Exponential Smoothing
Use this algorithm to smooth the source data.
R-Single Exponential Smoothing PropertiesSelect the mode in which you want to display the output.
• Trend: Displays source data along with predicted values for thegiven dataset.
• Forecast: Displays forecasted values for the given time period.
Output Mode
Select the input column to be forecasted.Dependent Column
Select the period for forecasting.Period
Select the period for forecasting. This option is only enabled if youselect "Custom" for "Period".
Periods Per Year
Enter the year from which the observations are to be considered. For example, 2009, 1987, 2019.
Start Year
Enter the period from which the observations are to be considered.Start Period
Enter the number of periods to predict.Periods to Predict
If you want to save the state of the algorithm, select this option. Tosave, you need to enter a name and a description for the model.
Save the Model
2012-11-1956
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 57/86
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
Enter a name for the newly created column that contains year values. Year Values
Enter a name for the newly created column that contains quarter values.
Quarter Values
Enter a name for the newly created column that contains month values.Month Values
Enter a name for the newly created column that contains period values.Period Values
Enter a smoothing constant for smoothing observations (baseparameters). Range: 0-1.
Alpha
Enter the confidence level of the algorithm (the accuracy of predictions).Confidence Level
Enter the number of periodic observations required to start thecalculation.
No. Periodic Observations
Enter the start value for level (a[0]) (l.start). For example: 0.4Level
11.1.3.4 R-Triple Exponential Smoothing
Use this algorithm to smooth source data and find seasonal trends in data.
R-Triple Exponential Smoothing Properties
Select the mode in which you want to display the output.• Trend: Displays source data along with predicted values for the given
dataset.
• Forecast: Displays forecasted values for the given time period.
Output Mode
Select the input column to be forecasted.Dependent Column
Select the period for forecasting.Period
Select the period for forecasting. This option is only enabled if you select"Custom" for "Period".
Periods Per Year
Enter the year from which the observations are to be considered. For example, 2009, 1987, 2019.
Start Year
Enter the period from which the observations are to be considered.Start Period
Enter the number of periods to predict.Periods to Predict
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
2012-11-1957
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 58/86
Enter a name for the newly created column that contains year values. Year Values
Enter a name for the newly created column that contains quarter values.Quarter Values
Enter a name for the newly created column that contains month values.Month Values
Enter a name for the newly created column that contains period values.Period Values
Enter a smoothing constant for smoothing observations (baseparameters). Range: 0-1.
Alpha
Enter a smoothing constant for finding trend parameters. Range: 0-1.Beta
Enter a smoothing constant for finding seasonal trend parameters.Gamma
Select the type of HoltWinters Exponential Smoothing algorithm.Seasonal
Enter the confidence level of the algorithm (the accuracy of predictions).Confidence Level
Enter the number of periodic observations required to start the
calculation.
No. Periodic Observations
Enter the start value for level (a[0]) (l.start). For example: 0.4Level
Enter the start value for finding trend parameters (b[0]) (b.start). For example: 0.4
Trend
Enter start values for finding seasonal parameters (s.start). This valueis dependent on the column you select. For example, if you select quarter as period, you need to provide four double values.
Season
Enter the starting values for alpha, beta, and gamma required for theoptimizer. For example: 0.3, 0.1, 0.1
Optimizer Inputs
11.1.4 Decision Trees
11.1.4.1 HANA C 4.5
Use this algorithm to classify observations into groups and predict one or more discrete variables basedon other variables.
2012-11-1958
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 59/86
HANA C 4.5 Properties
Select the mode in which you want to display the output data.
Possible values:
• Trend: Predicts the values for the dependent column and adds anextra column in the output containing the predicted values.
• Fill: Fills missing values in the target column.
Output Mode
Select input source columns.Independent Columns
Select the target column.Dependent Column
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values inthe independent or dependent columns.
• Keep: Retains missing values.
• Stop: Algorithm stops execution - if a value is missing in theindependent column or the dependent column.
Missing Values
Enter the percentage of data to be considered for analysis.Percentage
Enter the number of threads to be used for execution.Number of Threads
Enter the name of the independent column containing numerical values.Column Name
Enter bin ranges.Enter Bin Ranges
11.1.4.2 R-CNR Tree
Use this algorithm to classify observations into groups and predict one or more discrete variables basedon other variables. However, you can also use this algorithm to find trends in data.
Note:
• The "rpart" package which is part of R 2.11.1 cannot handle column names with spaces or specialcharacters. The "rpart" package supports only the input column name format that is supported byR dataframe.
• CNR tree doesn't work if the version of the caret package which is part of R is less than 4.85.• Independent column names used while scoring the model should be same as independent column
names used while creating the model.
• Column names containing spaces or any other special character other than period (.) are notsupported.
2012-11-1959
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 60/86
R-CNR Tree Properties
Select the mode in which you want to display the output data.
Possible values:
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
• Fill: Fills missing values in the target column.
Output Mode
Select input source columns.Independent Columns
Select the target column.Dependent Column
Select the method for handling missing values.
Possible values:
• Rpart: Algorithm deletes all observations for which the dependent columnis missing. However, it retains those observations for which one or more
independent columns are missing.• Remove: Algorithm skips the records containing missing values in the
independent columna or the dependent column.
• Keep: Retains missing values.
• Stop: Algorithm stops execution - if a value is missing in the independentcolumn or the dependent column.
Missing Values
Select the splitting rule type.
Possible values:
• Classification: Use this method - if the dependent variable has categoricalvalues.
• Regression: Use this method - if the dependent variable has continuousvalues.
Method
Enter the minimum number of observations required for splitting a node.Minimum Split
Select the splitting criteria of the node.
Possible values:
• Gini: Gini impurity.
• Information: Information gain.
Split Criteria
If you want to save the state of the algorithm, select this option. To save, youneed to enter a name and a description for the model.
Save the Model
Enter a name for the newly created column that contains the predicted values.Predicted Column
Name
Enter the complexity parameter that saves computing time by preventingany split that does not improve the fit.
Complexity Parameter
2012-11-1960
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 61/86
Enter the maximum node level in the final tree with the root node countedas level 0.
Note:
If the maximum depth is greater than 30, the algorithm does not produceresults as expected (on 32-bit machines).
Maximum Depth
Enter the number of cross validations. A higher cross validation valueincreases the computational time and produces more accurate results.
Cross Validation
Enter the vector of prior probabilities.Prior Probability
Select the surrogate to use in the splitting process.
Possible values:
• Display Only - an observation with a missing value for the primary splitrule is not sent further down the tree.
• Use Surrogate - use this option to split subjects missing the primaryvariable; if all surrogates are missing, the observation is not split.
• Stop if missing - If all surrogates are missing, sends the observation inthe majority direction.
Use Surrogate
Enter the style that controls the selection of the best surrogate.
Possible values:
• Use total correct classification - algorithm uses total number of correctclassifications to find a potential surrogate variable.
• Use percent non missing cases - algorithm uses the percentage of nonmissing cases classified to find a potential surrogate.
Surrogate Style
Enter the maximum number of surrogates to be retained at each node in atree.
Maximum Surrogate
11.1.5 Neural Network
11.1.5.1 R-MONMLP Neural Network
Use this algorithm for forecasting, classification, and statistical pattern recognition using R libraryfunctions.
2012-11-1961
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 62/86
Note:R does not support PMML storage for MONMLP Neural Network.
R-MONMLP Neural Network Properties
Select the mode in which you want to display the output data.
Possible values:
• Trend: Predicts the values for the dependent column andadds an extra column in the output containing the predictedvalues.
• Fill: Fills missing values in the target column.
Output Mode
Select input source columns.Independent Columns
Select the target column.Dependent Column
Enter the number of nodes/neurons in the first hidden layer
(hidden1).
Hidden Layer1 Neurons
If you want to save the state of the algorithm, select this option.To save, you need to enter a name and a description for themodel.
Save the Model
Enter a name for the newly created column that contains thepredicted values.
Predicted Column Name
Select the activation function to be used for the hidden layer (Th).
Hidden Layer Transfer Function
Select the activation function to be used for the output layer (To).Output Layer Transfer Function
Select the derivative of the hidden layer activation function(Th.prime).
Derivative of Hidden Layer
Transfer Function
Select the derivative of the output layer activation function(To.prime).
Derivative of Output Layer
Transfer Function
Enter the number of nodes/neurons in the second hidden layer (hidden2).
Hidden Layer2 Neurons
Enter the maximum number of iterations for the optimizationalgorithm (iter.max).
Maximum Iterations
Enter column indexes to which you want to apply the monotonicityconstraint (monotone).
Monotone Columns
Enter the number of training iterations after which the costfunction calculation stops (iter.stopped).
Training Iterations
Enter an initial weight vector (init.weights).Initial Weights
Enter the maximum number of exceptions for the optimizationroutine (max.exceptions).
Maximum Exceptions
To scale dependent columns to zero mean and unit varianceprior to fitting, select True (scale.y).
Scale Dependent Column
2012-11-1962
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 63/86
To use bootstrap aggregation, select True (bag).Bagging Required
Enter the number of repeated trials to avoid local minima(n.trials).
Trials to Avoid Local Minima
Enter the number of ensemble members to fit (n.ensemble).No. Ensemble Members
11.1.5.2 R-NNet Neural Network
Use this algorithm for forecasting, classification, and statistical pattern recognition using R libraryfunctions.
R-NNet Neural Network PropertiesSelect the mode in which you want to display the output data.
Possible values:
• Trend: Predicts the values for the dependent column and adds an extracolumn in the output containing the predicted values.
• Fill: Fills missing values in the target column.
Output Mode
Select input source columns.Independent Columns
Select the target column.Dependent Column
Select the method for handling missing values.Missing Values
Possible values:
• Remove: The algorithm skips the records containing missing valuesin the independent or dependent columns.
• Keep: The algorithm retains missing values for processing.
• Stop: The algorithm stops if a value is missing in the independentcolumn or the dependent column.
Enter the number of nodes/neurons in the hidden layer.Hidden Layer Neurons
Enter a name for the newly created column that contains the predictedvalues.
Predicted Column Name
Select the type of analysis to be done by the algorithm.Type
To add skip-layer connections from input to output, select True.Skip Hidden Layer
To obtain the linear output, select True. If you select the analysis typeClassification, this value must be true.
Linear Output
Select True to use "log-linear model" and "maximum conditional likelihood"fittings.
linout, entropy, softmax, and censored are mutually exclusive.
Use Softmax
2012-11-1963
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 64/86
To use "Maximum Conditional Likelihood" fitting, select True. By default,the algorithm uses the least-squares method.
Possible values:
• True: Use the "Maximum Conditional Likelihood" fitting• False: Use the least-squares method
Use Entropy
For softmax, a row of (0,1,1) indicates one example each of classes 2and 3, but for censored it indicates one example each of classes 2 or 3.
Use Censored
Enter initial random weights [-rang, rang]. Set this value to 0.5 unless theinput is large. If the input is large, choose the rang using the formula: rang* max(|x|) <= 1
Range
Enter a value used for calculating new weights (weight decay).Weight Decay
Enter the maximum number of iterations allowed.Maximum IterationsTo return the Hessian measure at the best set of weights, select True.Hessian Matrix Required
Enter the maximum number of weights allowed in the calculation.
There is no intrinsic limit in the code, but increasing the maximum number of weights may allow fits that are very slow and time-consuming.
Maximum Weights
Enter the value that indicates the perfect fit (abstol).Abstol
Algorithm terminates if the optimizer is unable to reduce the fit criterionby a factor: 1 - reltol
Reltol
Enter the list of contrasts to be used for factors appearing as variables in
the model.
Contrasts
If you want to save the state of the algorithm, select this option. To save,you need to enter a name and a description for the model.
Save the Model
11.1.6 Clustering
11.1.6.1 HANA K-Means
Use this algorithm to cluster observations into groups of related observations without any prior knowledgeof those relationships. The algorithm clusters observations into k groups, where k is provided as an
2012-11-1964
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 65/86
input parameter. The algorithm then assigns each observation to clusters based on the proximity of theobservation to the mean of the cluster. The process continues until the clusters converge.
Note:
• You might obtain a different cluster number for each cluster each time you execute the HANAK-Means algorithm. However, the observations in each cluster remain the same.
• Creating models using the HANA K-Means algorithm is not supported.
HANA K-Means Properties
Select the mode in which you want to display the output data.Output Mode
Select the input source columns.Independent Columns
Select the method for handling missing values.
Possible values:
• Remove: Algorithm skips the records containing missing values
in the independent or dependent columns.• Ignore: Algorithm ignores the record containing missing values
during calculation. However, the records are retained in the resulttable.
• Stop: Algorithm stops if a value is missing in the independentcolumn or the dependent column.
Missing Values
Enter the number of groups for clustering.Number of Clusters
Enter a name for the newly created column that contains the cluster name.
Cluster Name
Enter the number of iterations allowed for finding clusters.Maximum Iterations
Select the method to be used for calculating initial cluster centers.Center Calculation Method
To normalize the data, select True.Normalization
Enter the number of threads that can be used for execution.Number of Threads
Enter the threshold value for exiting from the iterations.Exit Threshold
11.1.6.2 R-K-Means
Use this algorithm to cluster observations into groups of related observations without any prior knowledgeof those relationships. The algorithm clusters observations into k groups, where k is provided as aninput parameter. The algorithm then assigns each observation to clusters based on the proximity of theobservation to the mean of the cluster. The process continues until the clusters converge.
2012-11-1965
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 66/86
Note:
• You might obtain a different cluster number for each cluster each time you execute the R-K-Meansalgorithm. However, the observations in each cluster remain the same.
• Creating models using the R-K-Means algorithm is not supported.
R-K-Means Properties
Select the mode in which you want to display the output data.Output Mode
Select the input source columns.Independent Columns
Enter the number of groups for clustering.Number of Clusters
Enter a name for the newly created column that contains thecluster name.
Cluster Name
Enter the number of iterations allowed for finding clusters.Maximum Iterations
Enter the number of random initial sets for clustering (n start).Number of Initial Sets
Select the type of algorithm to be used for performing K-Meansclustering.
Algorithm
11.1.7 Association
11.1.7.1 HANA Apriori
Use this algorithm to find frequent itemsets patterns in large transactional datasets for generatingassociation rules. This algorithm is used to understand what products and services customers tend topurchase at the same time. By analyzing the purchasing trends of customers with association analysis,you can predict their future behavior.
For example, the information that a customer who buys shoes is more likely to buy socks at the sametime can be represented in an association rule (with a given minimum support and minimum confidence)as: Shoes=> Socks [support = 0.5, confidence= 0.1]
HANA Apriori Properties
Select the columns containing the items to which you want to apply thealgorithm.
Item Column(s)
Select the column containing the transaction IDs to which you want toapply the algorithm.
TransactionID Column
2012-11-1966
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 67/86
Select the method for handling missing values.
Possible values:
• Remove: The algorithm skips the records containing missing values
in the independent or dependent columns.• Keep: The algorithm retains missing values for processing.
Missing Values
Enter a value for the minimum support of an item.Support
Enter a value for the minimum confidence of rules/association.Confidence
Enter a name for the new column that contains the antecedent (LHS) of the apriori rule for the given dataset.
Pre Rule
Enter a name for the new column that contains the consequent (RHS)of the apriori rule for the given dataset.
Post Rule
Enter a name for the new column that contains the support for the
corresponding rules.
Support Values
Enter a name for the new column that contains the confidence valuesfor the corresponding rules.
Confidence Values
Enter a name for the new column that contains the lift values for thecorresponding rules.
Lift values
Enter the number of threads to be used for execution.Number of Threads
11.1.7.2 R-Apriori
Use this algorithm to find frequent itemsets patterns in large transactional datasets for generatingassociation rules using the "arules" R package. This algorithm is used to understand what productsand services customers tend to purchase at the same time. By analyzing the purchasing trends of customers with association analysis, prediction of their future behavior can be made.
For example, the information that a customer who buys shoes is more likely to buy socks at the sametime can be represented in an association rule (with a given minimum support and minimum confidence)as: Shoes=> Socks [support = 0.5, confidence= 0.1]
R-Apriori Properties
Select the mode to display the output.Output Mode
Select the format of the input data.Input Format
Select the columns containing the items to which you want to applythe algorithm.
Item Column(s)
Select the column containing the transaction IDs to which you want toapply the algorithm.
TransactionID Column
2012-11-1967
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 68/86
Enter a value for the minimum support of an item.Support
Enter a value for the minimum confidence of rules/association.Confidence
If you want to save the state of the algorithm, select this option. Tosave, you need to enter a name and a description for the model.
Save the Model
Enter a name for the new column that contains the apriori rules for thegiven dataset.
Rules
Enter a name for the new column that contains the support for thecorresponding rules.
Support Values
Enter a name for the new column that contains the confidence valuesfor the corresponding rules.
Confidence Values
Enter a name for the new column that contains the lift values for thecorresponding rules.
Lift values
Enter a name for the new column that contains transaction ID.Transaction IDEnter a name for the new column that contains the names of the items.Items
Enter a name for the new column that contains the matching rules.Matching Rules
Enter comma-separated labels for the items that appear on the lefthand side of rules or itemsets.
Lhs Item(s)
Enter comma-separated labels for the items that appear on the righthand side of rules or itemsets.
Rhs Item(s)
Enter comma-separated labels for the items that appear on both sidesof rules or itemsets.
Both Item(s)
Enter a comma-separated labels of the items which need not appear
in the rules or itemsets.
None Item(s)
Enter default appearance of items that are not explicitly mentioned.Default Appearance
Select the sort option to sort items by their frequency.Sort Items
Enter a numerical value that indicates how to filter unused items fromtransactions.
Filter Items
To organize transactions as a prefix tree, select True.Tree View
To use heap sort instead of quick sort to sort transactions, select True.Use HeapSort
To minimize memory usage instead of maximizing speed, select True.Minimize Memory
To load transactions into memory, select True.Load Transaction
11.1.8 Classification
2012-11-1968
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 69/86
11.1.8.1 HANA KNN
Use this component to classify objects based on the trained sample data. In KNN, objects are classifiedby the majority votes of its neighbors.
HANA KNN Properties
Select input source columns.Independent Columns
Enter the number of neighbors to consider for finding distances.Neighborhood Count
Select the voting type.Voting Type
Select the method for handling missing values.
• Remove: The algorithm skips the records containing missingvalues in the independent or dependent columns.
• Keep: The algorithm considers missing values for processing.
• Stop: The algorithm stops the execution if a value is missing inthe independent column or the dependent column.
Missing Values
Enter the schema that contains the trained data.Schema Name
Enter the table that contains the trained data.Table Name
Enter input columns to be considered for training data.Independent Columns
Enter the output column to be considered for training data.Dependent Column
Enter the number of threads to be used for execution.Number of ThreadsEnter a name for the new column that contains the classificationvalues.
Predicted Column Name
11.2 Data Preparation Components
Use data preparation components to prepare the data for analysis. These are optional components.
11.2.1 Formula
Use this component to apply predefined functions and operators on the data. All functions andexpressions except data manipulation functions add a new column with the formula result.
2012-11-1969
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 70/86
Note:
• When entering a string literal that contains single quotation marks, each single quotation mark insidethe string literal must be escaped with a backslash character. For example, enter 'Customer's' as
'Customer\'s'.• When entering a column name that contains square brackets, each square bracket inside the columnname must be escaped with a backslash character. For example, enter [Customer[Age]] as[Customer\[Age\]].
Formula Properties
Enter a name for the new column created by applying the formula.Name
Enter the formula you want to apply. For example, Average([Age]).Expression
Example: Calculating average age of employees
Employee Table:
Date of Confir-mationDate of Joining AgeDOBEmp NameEmp ID
27/11/200512/9/20052511/11/1986Laura1
10/7/200024/6/20003012/5/1981Desy2
24/12/199810/10/19983330/5/1978 Alex3
20/12/19992/12/1999326/6/1979John4
1. Drag the Formula component onto the analysis editor.
2. In the properties view, enter a name for the formula.
For example, Average_Age.
3. In the Expression field, enter the formula: AVERAGE([Age])
4. Choose Validate and Apply to validate the formula syntax.
Output table:
2012-11-1970
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 71/86
Average_AgeDate of Con- firmation
Date of Join-ing AgeDOBEmp NameEmp ID
3027/11/200512/9/20052511/11/1986Laura1
3010/7/200024/6/20003012/5/1981Desy2
3024/12/199810/10/19983330/5/1978 Alex3
3020/12/19992/12/1999326/6/1979John4
Supported Functions
DescriptionFunction (Function when applied on theEmployee table)Category
Returns the number of days between twodates.
DAYSBETWEENDate
Returns the current system date.CURRENTDATE
Returns the number of months between twodates.
For example, the new column contains2,0,2,0 when MONTHSBETWEEN([Date of Joining],[Date of Confirmation]) is applied tothe Employee table.
MONTHSBETWEEN
Returns the day name in string format.
For example, the new column containsMonday, Saturday, Saturday, Thursday whenDAYNAME([Date of Joining]) is applied tothe Employee table.
DAYNAME
Returns the day number of the particular month.
For example, 12/11/1980 returns 12.
DAYNUMBEROFMONTH
Returns the day number in a week.
For example, Sunday =1, Monday=2.
DAYNUMBEROFWEEK
Returns the day number in a year.
For example, 1st Jan =1, 1st Feb=32, 3rdFeb=34.
DAYNUMBEROFYEAR
Returns the date of the last day in a week.
For example, 12/9/2005 returns 17/9/2005
LASTDATEOFWEEK
2012-11-1971
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 72/86
DescriptionFunction (Function when applied on theEmployee table)Category
Returns the date of the last day in a month.
For example, 12/9/2005 returns 30/9/2005
LASTDATEOFMONTH
Returns the month number in a date.
For example, Jan=1, Feb=2, Mar=3
MONTHNUMBEROFYEAR
Returns the week number in a year.
For example, 12/9/2005 returns 38.
WEEKNUMBEROFYEAR
Returns the quarter number in a date.
For example, 12/9/2005 returns 3.
QUARTERNUMBEROFDATE
Concatenates two strings.
For example, CONCAT('USA', 'Australia')returns USAAustralia.
CONCATString
Returns true - if the search string is found inthe source string.
For example, INSTRING('USA', 'US') returnstrue.
INSTRING
Returns a substring from the source string.
For example, SUBSTRING('USA', 1,2) re-
turns US.
SUBSTRING
Returns the number of characters in thesource string. For example, STRLEN('Aus-tralia') returns 9.
STRLEN
Returns the maximum value in a column.MAXMath
Returns the minimum value in a column.MIN
Returns the number of values in a column.COUNT
Returns the sum of the values in a column.SUM
Returns the average of the values in a col-umn. AVERAGE
Performs in-place replacement of a string.
For example, @REPLACE([country],'USA','AMERICA') replaces USA with AMERICA inthe country column.
@REPLACEData Ma-nipulation
2012-11-1972
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 73/86
DescriptionFunction (Function when applied on theEmployee table)Category
Replaces blank values with a specified value.
For example, @BLANK([country], 'USA') re-places all blank values with USA in thecountry column.
@BLANK
Selects rows that satisfy the given condition.You can use any conditional operator tospecify the condition.
For example, @SELECT([country]=='USA')selects rows where country is equal to USA.
@SELECT
Checks whether the condition is met, andreturns one value if 'true' and another valueif 'false'.
For example, IF([Date of Joining]>12/9/2005)THEN ('Employee joined after Sept 12, 2005')ELSE ('Employee joined on or before Sept12, 2005')
IF(condition) THEN(string expression/mathe-matical expression/conditional expression)ELSE(string expression/mathematical expres-sion/conditional expression)
Condition-al Expres-sion
Note:Mathematical expressions containing functions that return a numerical value are not supported. For example, expression DAYNUMBEROFMONTH(CURRENTDATE())+2 is not supported becauseDAYNUMBEROFMONTH returns a numerical value.
Mathematical Operators
Use mathematical operators to create formulas containing numerical columns and/or numbers. For example, the expression [Age] + 1 adds a new column with values 26, 31, 34, 33.
DescriptionMathematical Operators
Addition operator +
Subtraction operator -
Multiplication operator *
Division operator /
Round brackets or parenthesis()
2012-11-1973
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 74/86
DescriptionMathematical Operators
Power operator ^
Modulo operator %
Exponential operator E
Conditional Operators
Use conditional operators to create IF THEN ELSE or SELECT expressions.
DescriptionConditional Operators
Equal to==
Not equal to!=
Less than<
Greater than>
Less than or equal to<=
Greater than or equal to>=
Logical Operators
Use logical operators to compare two conditions and return 'true' or 'false'. For example, IF([Date of Joining]>12/9/2005 && [Age] >=25 ) THEN ('True') ELSE ('False') adds a new column with values True,False, False, False.
DescriptionLogical Operators
AND&&
OR||
11.2.2 Sample
Use this component to select a subset of data from large datasets.The Sample component supports the following sample types:
• First N: Selects the first N records in the dataset.
• Last N: Selects the last N records in the dataset.
• Every N: Selects every Nth record in the dataset, where N is an interval. For example, if N=2, the2nd, 4th, 6th, and 8th records are selected and so on.
• Simple Random: Randomly selects records of size N or N percent of records in a dataset.
2012-11-1974
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 75/86
• Systematic Random: In this sample type, sample intervals or buckets are created based on thebucket size. The Sample component selects the Nth record at random from the first bucket, andfrom each subsequent bucket the Nth record is selected.
Sample Properties
Select the type of sampling.Sampling Type
Select the method for limiting the rows.Limit Rows by
Enter the number of rows to be selected.Number of Rows
Enter the percentage of rows to be selected.Percentage of Rows
Enter the bucket size within which a random row is selected.Bucket Size
Enter the interval between rows to be selected.Interval
Enter the maximum number of rows to be selected.Maximum Rows
Example: Selecting subset of data from a given dataset
AgeDOBEmp NameEmp ID
2511/11/1986Laura1
3012/5/1981Desy2
3330/5/1978 Alex3
326/6/1979John4
244/7/1987Ted5
4130/6/1970Tom6
4624/6/1965 Anna7
216/7/1990Valerie8
2619/9/1985Mary9
2521/11/1986Martin10
1. First N: For N=5
2012-11-1975
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 76/86
AgeDOBEmp NameEmp ID
2511/11/1986Laura1
3012/5/1981Desy2
3330/5/1978 Alex3
326/6/1979John4
244/7/1987Ted5
2. Last N: For N=4
AgeDOBEmp NameEmp ID
4624/6/1965 Anna7
216/7/1990Valerie82619/9/1985Mary9
2521/11/1986Martin10
3. Every N: Interval=3
AgeDOBEmp NameEmp ID
3330/5/1978 Alex3
4130/6/1970Tom6
2619/9/1985Mary9
4. Simple Random: For number of rows=2
The result can be any two rows.
AgeDOBEmp NameEmp ID
4624/6/1965 Anna7
216/7/1990Valerie8
5. Systematic Random: Bucket Size=4
2012-11-1976
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 77/86
AgeDOBEmp NameEmp ID
3012/5/1981Desy2
4130/6/1970Tom6
2521/11/1986Martin10
or
AgeDOBEmp NameEmp ID
2511/11/1986Laura1
244/7/1987Ted5
2619/9/1985Mary9
11.2.3 Data Type Definition
Use this component to change the name, data type, and date format of the source column. Definingthe data type helps you to prepare data to make it suitable for further analysis.
For example,
• If the name of the column in the data source is "des", it may not be clear during analysis. You canchange the name of the column to "Designation" in the analysis, so that the end users can easilyunderstand it.
• If the date is stored in the mmddyy (120201, without any date separator) format, it may be consideredas an integer value by the system. Using the Data Type Definition component, you can change thedate format to any valid format such as mm/dd/yyyy, or dd/mm/yyyy, and so on.
To change the name, data type, and the date format of the source column, perform the following steps:
1. Add the data type definition component into the analysis.
2. Right-click the component and choose Configure Properties.
3. To change the column name, enter an alias name for the required source column.
4. To change the data type of the column, select the required data type for the source column.
11.2.4 Filter
Use this component to filter rows and columns based on a specified condition.
2012-11-1977
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 78/86
Note:
• The In-DB Filter component does not support functions and advanced expressions.
• If you change the data source after configuring the filter component, the filter component still retains
the previously defined row filters.
Filter Properties
Select columns for analysis.Selected Columns
Enter the filter condition.Filter Condition
Example: Filter "Store" column from the source data and apply "Profit >2000" condition.
ProfitRevenueStore
100010000Land Mark
450020000Spencer
800025000Soch
1. Uncheck the "Store" column from the Selected Columns.
2. In the Row Filter pane, choose the Profit column.
3. In the Select from Range option, enter 2000 in the From text box. The To text box should beempty.
4. Choose OK.
5. Choose Save and Close.
6. Execute the analysis.
ProfitRevenue
450020000
800025000
Note:The Filter component only supports expressions that return Boolean result.
For example, in the Employee table below:
2012-11-1978
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 79/86
Date of Confir-mationDate of Joining AgeDOBEmp NameEmp ID
27/11/200512/9/20052511/11/1986Laura1
10/7/200024/6/20003012/5/1981Desy2
24/10/199810/10/19983330/5/1978 Alex3
20/12/19992/12/1999326/6/1979John4
• The expression DAYSBETWEEN([Date of Joining],[Date of Confirmation]) is not a valid filter expression since it returns a numerical value. The correct usage of the DAYSBETWEEN expressionin filter is DAYSBETWEEN([Date of Joining],[Date of Confirmation]) == 14. This expression selectsthose rows where number of days between "Date of Joining" and "Date of Confirmation" is 14. For the employee table above, the third row is selected.
• DAYNAME([Date of Joining]) == 'Saturday' selects the second and third rows in the employee table.
Note:
• When entering a string literal that contains single quotation marks, each single quotation mark insidethe string literal must be escaped with a backslash character. For example, enter 'Customer's' as'Customer\'s'.
• When entering a column name that contains square brackets, each square bracket inside the columnname must be escaped with a backslash character. For example, enter [Customer[Age]] as[Customer\[Age\]].
Supported Functions
Note:
The Filter component does not support data manipulation functions.
2012-11-1979
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 80/86
DescriptionFunction (Function when applied on theEmployee table)Category
Returns the number of days between two
dates.
DAYSBETWEENDate
Returns the current system date.CURRENTDATE
Returns the number of months between twodates.
For example, the new column contains2,0,2,0 when MONTHSBETWEEN([Date of Joining],[Date of Confirmation]) is applied tothe Employee table.
MONTHSBETWEEN
Returns the day name in the string format.
For example, the new column containsMonday, Saturday, Saturday, Thursday whenDAYNAME([Date of Joining]) is applied onthe Employee table.
DAYNAME
Returns the day number of the particular month.
For example, 12/11/1980 returns 12.
DAYNUMBEROFMONTH
Returns the day number in a week.
For example, Sunday =1, Monday=2.
DAYNUMBEROFWEEK
Returns the day number in a year.
For example, 1st Jan =1, 1st Feb=32, 3rdFeb=34.
DAYNUMBEROFYEAR
Returns the date of the last day in a week.
For example, 12/9/2005 returns 17/9/2005
LASTDATEOFWEEK
Returns the date of the last day in a month.
For example, 12/9/2005 returns 30/9/2005
LASTDATEOFMONTH
Returns the month number in a date.
For example, Jan=1, Feb=2, Mar=3
MONTHNUMBEROFYEAR
Returns the week number in a year.
For example, 12/9/2005 returns 38.
WEEKNUMBEROFYEAR
2012-11-1980
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 81/86
DescriptionFunction (Function when applied on theEmployee table)Category
Returns the quarter number in a date.
For example, 12/9/2005 returns 3.
QUARTERNUMBEROFDATE
Concatenates two strings.
For example, CONCAT('USA', 'Australia')returns USAAustralia.
CONCATString
Returns true - if the search string is found inthe source string.
For example, INSTRING('USA', 'US') returnstrue.
INSTRING
Returns a substring from the source string.For example, SUBSTRING('USA', 1,2) re-turns US.
SUBSTRING
Returns the maximum value in a column.MAXMath
Returns the minimum value in a column.MIN
Returns the number of values in a column.COUNT
Returns the sum of the values in a column.SUM
Returns the average of the values in a col-umn.
AVERAGE
Checks whether the condition is met, andreturns one value if 'true' and another valueif 'false'.
For example, IF([Date of Joining]>12/9/2005)THEN ('Employee joined after Sept 12, 2005')ELSE ('Employee joined on or before Sept12, 2005')
IF(condition) THEN(string expression/mathe-matical expression/conditional expression)ELSE(string expression/mathematical expres-sion/conditional expression)
Condition-al Expres-sion
Note:Mathematical expressions containing functions that return a numerical value are not supported. For
example, expression DAYNUMBEROFMONTH(CURRENTDATE())==2 is not supported becauseDAYNUMBEROFMONTH returns a numerical value.
Mathematical Operators
Use mathematical operators to create formulas containing numerical columns and/or numbers. For example, the expression [Age] + 1 adds a new column with the values 26, 31, 34, 33.
2012-11-1981
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 82/86
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 83/86
11.3.1 CSV Writer
Use this component to write data to flat files such as CSV, TEXT, and DAT files.
CSV Writer Properties
Select .csv or .dat or .txt file.File Name
To overwrite an existing file, select this option.Overwrite
Select a column delimiter that separates data tokens in the file.Column Separator
Select the character to be added when writing the data.Quotation Character
Select this option to use the first row as column headers.Include Column Headers
Select the text-encoding method to be used when writing thedata.
Encoding
Select the character to be used for decimal representation in digitgrouping.
Decimal Separator
Select the character to be used as the thousands separator.Grouping Separator
Enter the number format you want to apply to numerical data.Number Format
Select the date format you want to apply to dates.Date Time Format
11.3.2 JDBC Writer
Use this component to write data to relational databases such as MySQL, MS SQL Server, DB2, Oracle,SAP MaxDB, and SAP HANA.
JDBC Writer Properties
Select the database type.Database Type
Enter the location of the JDBC driver path. For example, to write tothe Oracle database, you need to specify the location of the OracleJDBC jar (C:\ojdbc6.jar)
Database Driver Path
Enter the name of the machine on which the database is installed.Machine Name
Enter the database or service port number.Port Number
Enter the name of the database.Database Name
Enter the database user name.User Name
2012-11-1983
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 84/86
Enter the password for the database user.Password
Enter the type of the table. This property is applicable when writingto the SAP HANA database.
Table Type
Enter the table name.Table Name
Select this option to overwrite the table if it already exists.Overwrite
11.3.3 HANA Writer
Use this component to write data to SAP HANA database tables.
HANA Writer ComponentEnter the name of the schema.Schema Name
Select the table type of the table to which you want to write data.Table Type
Enter the name of the table.Table Name
Select this option to overwrite the table if it already exists.Overwrite
11.4 Saved Models
Models that you create by saving the state of algorithms are listed under the Saved Models tab. TheSAP Predictive Analysis application does not contain predefined models. Therefore, when you launchthe application for the first time, the Saved Models tab does not appear.
For information on creating a new model, see the "Creating a Model" section under Working with Models .
2012-11-1984
Component Properties
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 85/86
More Information
LocationInformation Resource
http://www.sap.comSAP BusinessObjects product infor-mation
Navigate to http://help.sap.com/businessobjects and on the "SAP Busi-nessObjects Overview" side panel click All Products.
You can access the most up-to-date documentation covering all SAPBusinessObjects products and their deployment at the SAP Help Portal.You can download PDF versions or installable HTML libraries.
Certain guides are stored on the SAP Service Marketplace and are notavailable from the SAP Help Portal. These guides are listed on the HelpPortal accompanied by a link to the SAP Service Marketplace. Customerswith a maintenance agreement have an authorized user ID to accessthis site. To obtain an ID, contact your customer support representative.
SAP Help Portal
http://service.sap.com/bosap-support > Documentation
• Installation guides: https://service.sap.com/bosap-instguides• Release notes: http://service.sap.com/releasenotes
The SAP Service Marketplace stores certain installation guides, upgradeand migration guides, deployment guides, release notes and SupportedPlatforms documents. Customers with a maintenance agreement havean authorized user ID to access this site. Contact your customer supportrepresentative to obtain an ID. If you are redirected to the SAP ServiceMarketplace from the SAP Help Portal, use the menu in the navigationpane on the left to locate the category containing the documentation youwant to access.
SAP Service Marketplace
https://cw.sdn.sap.com/cw/community/docupedia
Docupedia provides additional documentation resources, a collaborativeauthoring environment, and an interactive feedback channel.
Docupedia
https://boc.sdn.sap.com/
https://www.sdn.sap.com/irj/sdn/businessobjects-sdklibraryDeveloper resources
2012-11-1985
More Information
8/9/2019 pa1_0_7_user_en
http://slidepdf.com/reader/full/pa107useren 86/86
LocationInformation Resource
https://www.sdn.sap.com/irj/boc/businessobjects-articles
These articles were formerly known as technical papers.
SAP BusinessObjects articles on
the SAP Community Network
https://service.sap.com/notes
These notes were formerly known as Knowledge Base articles.Notes
https://www.sdn.sap.com/irj/scn/forumsForums on the SAP CommunityNetwork
http://www.sap.com/services/education
From traditional classroom learning to targeted e-learning seminars, we
can offer a training package to suit your learning needs and preferredlearning style.
Training
http://service.sap.com/bosap-support
The SAP Support Portal contains information about Customer Supportprograms and services. It also has links to a wide range of technical in-formation and downloads. Customers with a maintenance agreementhave an authorized user ID to access this site. To obtain an ID, contactyour customer support representative.
Online customer support
http://www.sap.com/services/bysubject/businessobjectsconsulting
Consultants can accompany you from the initial analysis stage to thedelivery of your deployment project. Expertise is available in topics suchas relational and multidimensional databases, connectivity, databasedesign tools, and customized embedding technology.
Consulting
More Information