User Guide for Integrating DB Lytix™ and Excel - Teradata

40
User Guide for Integrating DB Lytix™ and Excel Sunday, September 14, 2014 © 2014 Fuzzy Logix LLC No part of this document or any of its contents may be reproduced, copied, modified or adapted, without the prior written consent of the author, unless otherwise indicated for stand-alone materials. Teradata ADS Generator, a product within the Teradata Warehouse Miner suite, can provide a front end for Fuzzy Logix DB Lytix™ in-database library of advanced analytic components. This document provides a tutorial on how to call and manage the different types of DB Lytix™ components from ADS Generator. Teradata ADS Generator -

Transcript of User Guide for Integrating DB Lytix™ and Excel - Teradata

Page 1: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel

Sunday, September 14, 2014

© 2014 Fuzzy Logix LLCNo part of this document or any of its contents maybe reproduced, copied, modified or adapted, withoutthe prior written consent of the author, unlessotherwise indicated for stand-alone materials.

Teradata ADS Generator, a product within the Teradata Warehouse Miner suite, can provide a front end for

Fuzzy Logix DB Lytix™ in-database library of advanced analytic components. This document provides a tutorial

on how to call and manage the different types of DB Lytix™ components from ADS Generator.

Teradata ADS Generator -

Page 2: User Guide for Integrating DB Lytix™ and Excel - Teradata

2User Guide for Integrating DB Lytix™ and Excel © 2014 Fuzzy Logix LLC

Table of Contents

1: Introduction 3

2: Pre-Requisites 4

3: DB Lytix™ Function Workflow and Excel Visualization 5

4: Loan Default Prediction (Logistic Regression) 6

74.1: Import the Demo Project "Loan Default Prediction"

84.2: Map Your Input Database "FL_Train"

94.3: Open "Loan Default Prediction" Project

174.4: A Closer Look at Excel Integration

5: Twitter Buzz Prediction (Linear Regression) 19

205.1: Import the Demo Project "Twitter Buzz Prediction"

215.2: Map Your Input Database "FL_Train"

225.3: Open "Twitter Buzz Prediction" Project

295.4: A Closer Look at Excel Integration

6: Country Category Prediction 31

326.1: Import the Demo Project "Country Category Prediction"

336.2: Map Your Input Database "FL_Train"

346.3: Open "Country Category Prediction" Project

386.4: A Closer Look at Excel Integration

Page 3: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 3 © 2014 Fuzzy Logix LLC

1. IntroductionThis user guide outlines how to integrate and use Excel with Teradata ADS Generator for visualizing the results returned by

Fuzzy Logix DB LytixTM functions.

Teradata ADS Generator is part of the Teradata Warehouse Miner family of products. It was built to support both

comprehensive data profiling as well as analytic data generation for Teradata customers. For a detailed overview and

usage assistance, please refer to the following:

o Teradata Warehouse Miner User Guide - Volume 1 - Introduction and Profiling Release 5.3.5 (B035-2300-064A, June

2014)

o Teradata Warehouse Miner User Guide - Volume 2 - ADS Generation Release 5.3.5 (B035-2301-064A, June 2014)

DB LytixTM, a product developed by Fuzzy Logix, offers scalable and robust high performance analytical methods that are

embedded seamlessly into database systems. The DB LytixTM library of statistical, machine learning, and quantitative

methods provide Teradata customers a rich set of in-database components that can be used directly using SQL or with

ADS Generator.

o For detailed description of DB LytixTM function syntax and capabilities, please refer to “User Manual for DB LytixTM on

Teradata Advanced Package v1.1.0”.

o For help with using ADS Generator with DB LytixTM functions, please refer to “Teradata ADS Generator & Fuzzy Logix

DB LytixTM Integration Guide, Teradata Partner Integration Lab, 8/29/2014”

To access the listed documents, please visit: http://developer.teradata.com/applications/reference

Page 4: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 4 © 2014 Fuzzy Logix LLC

2. Pre-RequisitesThe following actions must have been completed:

You have loaded the Demo database provided by Fuzzy Logix, LLC as part of the DB LytixTM installation. We will refer to

this database as “FL_Train” in this document.

You have installed the DB LytixTM library to “FL_Train”.

You have installed ADS Generator 5.3.5.1

You have operational knowledge of ADS Generator application. We will be focusing on how DB LytixTM functions are being

invoked from ADS Generator and visualized in Excel.

Page 5: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 5 © 2014 Fuzzy Logix LLC

3. DB Lytix™ Function Workflow and Excel Visualization

This user guide outlines how ADS Generator and DB LytixTM can be used to put together a work flow for a business use-case.

Excel is used for the visualization of the results produced by the work flows. We will demonstrate the capabilities by using 4

pre-built, frequently used, analytics use-cases. For each of the use-cases described below, we have a pre-packaged Excel

template file that can be utilized to visualize the results:

Name of

Project

(File Name)

Business Use-case DB LytixTM Functions

demonstrated

Outputs Visualized

in Excel

Excel Template

Name

Loan Default

Prediction

(Loan Default

Prediction.bin)

Predict how likely a

customer (lendee) is to

defaulting on a loan.

FLRegrDataPrep

FLLogRegr

FLLogRegrScore

FLROCCurve

Confusion Matrix

Analysis Stats

Confusion Matrix

ROC Curve

Actual vs.

Predicted Plot

Logistic.xlsm

Twitter Buzz

Prediction

(Twitter Buzz

Prediction.bin)

Predict the magnitude

of twitter buzz based

on number of

discussions, number of

authors, number of

interactions between

the authors, etc.

FLRegrDataPrep

FLLinRegr

FLLinRegrScore

Analysis Stats

Residual Plot

Error

Distribution

LinearRegr.xlsm

Country

Category

Prediction

(Country

Category

Prediction.bin)

Based on country

attributes such as GDP,

Growth, Inflation,

predict whether a given

country is developed or

not.

FLDecisionTree

FLDecisionTreeScore

Confusion Matrix

Analysis Stats

Confusion Matrix

Tree

Representation

DecisionTree.xlsm

Next, we will look at each of the above analysis in detail.

Page 6: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 6 © 2014 Fuzzy Logix LLC

4. Loan Default Prediction (Logistic Regression)The “Loan Default Prediction” project provides a sample implementation that a lender can utilize for predicting how likely a

customer (lendee) is to defaulting on a loan. You can explore this project by loading the provided “Loan Default

Prediction.bin” file into your ADS Generator environment.

With the help of this project, we will be reviewing the integration between ADS Generator, DB LytixTM and Excel.

Follow the steps below for importing and exercising this analysis.

Page 7: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 7 © 2014 Fuzzy Logix LLC

4.1 Import the Demo Project "Loan Default Prediction"

Page 8: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 8 © 2014 Fuzzy Logix LLC

4.2 Map Your Input Database "FL_Train"

Your input database will be the database where you have loaded the Demo data provided as part of your DB LytixTM

installation. In our case, the input database is “FL_Train”.

Page 9: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 9 © 2014 Fuzzy Logix LLC

4.3 Open "Loan Default Prediction" Project

Once you have imported the project, you should see the following screen:

Note the following artifacts in the project explorer window on the right:

“LoanDefault” Analysis – This is the analysis that organizes the various steps in this project.

We will review this in detail below.

“Logistic.xlsm” Excel spreadsheet attachment – This is the Excel template provided by Fuzzy Logix for visualizing the

results of a logistic regression analysis.

Page 10: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 10 © 2014 Fuzzy Logix LLC

Loan Default Prediction DataFor this analysis, we will be using the input table tblLoanData. This table comes pre-loaded with the Demo database for

DB LytixTM.

tblLoanData

Table Column Datatype

Loanid INTEGER

loan_amnt INTEGER

funded_amnt INTEGER

term VARCHAR

apr FLOAT

int_rate FLOAT

installment FLOAT

grade VARCHAR

sub_grade VARCHAR

emp_name VARCHAR

emp_length VARCHAR

home_ownership VARCHAR

annual_inc INTEGER

is_inc_v VARCHAR

purpose VARCHAR

addr_city VARCHAR

addr_state VARCHAR

acc_now_delinq INTEGER

acc_open_past_24mths INTEGER

bc_open_to_buy INTEGER

percent_bc_gt_75 INTEGER

bc_util FLOAT

dti FLOAT

delinq_2yrs INTEGER

delinq_amnt INTEGER

earliest_cr_line VARCHAR

fico_range_low INTEGER

fico_range_high INTEGER

inq_last_6mths INTEGER

mths_since_last_delinq INTEGER

mths_since_last_record INTEGER

mths_since_recent_inq INTEGER

mths_since_recent_loan_delinq INTEGER

mths_since_recent_revol_delinq INTEGER

mths_since_recent_bc INTEGER

mort_acc INTEGER

open_acc INTEGER

Page 11: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 11 © 2014 Fuzzy Logix LLC

pub_rec INTEGER

total_bal_ex_mort INTEGER

default_ind BYTEINT

This table has 140,325 loan records. For each loan, we have 39 attributes; including an indicator if the loan was defaulted on

(default_ind). This indicator will be the variable we will be predicting.

Loan Default Prediction Analysis

In the above figure, you see the ADS Generator analysis “LoanDefault”. This analysis is a work flow that goes from the input

data to model building to result visualization. The ADS Generator allows the user to parameterize inputs to each step so that

outputs from previous steps can be passed on to the next step. This capability is used to string together and run the work

flow with a single click. In the analysis above, the steps are run from top to bottom sequentially. In each step, output from

one of more of the previous steps can be received as inputs.

We outline each of the steps below in detail:

1. We start by setting the current database. This identifies the database to which the results from this analysis will be

persisted.

Page 12: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 12 © 2014 Fuzzy Logix LLC

2. In this 2nd series of steps, we create data samples for training and cross-validation:

3. Then we prepare the training data. This step includes the transformation of the data to “deep form” as required by

DB LytixTM functions:

This step also:

i. Removes correlated data,

ii. Makes the data sparse (There is no need to store missing values or zeros)

iii. Converts non-numeric data and categorical values to numeric values.

Page 13: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 13 © 2014 Fuzzy Logix LLC

4. Run the Logistic Regression.

5. Prepare the cross-validation data. This is the step where we start to evaluate the quality of the model.

Page 14: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 14 © 2014 Fuzzy Logix LLC

6. We score the cross-validation observations and persist the scores for subsequent analysis and visualization.

7. In order to evaluate the quality of the model, we compute some standard metrics:

i. ROC curve

ii. Confusion matrix (evaluated inside the Excel template)

iii. Additional goodness of fit statistics

8. And then, we request the visualization of the results. The results are visualized using the excel template

(Logistic.xlsm) that is attached to the project. As the analysis executes, you will see the steps being highlighted.

Once the execution is complete, Excel will come into focus with the results of the analysis.

Page 15: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 15 © 2014 Fuzzy Logix LLC

i. Statistics Sheet – This is a summary of the logistic regression model and related statistics. You can observe the

coefficient estimates, goodness of fit statistics as well as the confusion matrix here. [Confusion matrix, also

known as also known as a contingency table or an error matrix allows visualization of the performance of the

algorithm by showing the true and false positives and negatives].

ii. ROC Curve Sheet – The ROC curve measures the velocity at which events are picked up when the data is

ranked by the strongest prediction to weakest. Better the model, steeper the initial portion of the curve and

greater the separation from the diagonal, which signifies random results (i.e., no predictive power).

Page 16: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 16 © 2014 Fuzzy Logix LLC

iii. Actual-vs-Predicted Sheet – This chart shows the relative positioning of the actual values vs. the predicted

values. It is normal to see a few outliers as seen here.

Page 17: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 17 © 2014 Fuzzy Logix LLC

4.4 A Closer Look at Excel Integration

The integration with Excel is achieved using the capability in ADS Generator to invoke external commands/programs.

This is done by adding a Run Unit of the type “External Program/Script (any)”.

In the properties for the “External Program/Script (any)”, we specify the program to run (excel.exe) and the parameters for

that program:

The “Arguments” are described below:

Page 18: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 18 © 2014 Fuzzy Logix LLC

The actual parameter values are passed in at runtime through the following associations:

When this Run unit is executed, the ADS generator automatically extracts the “Logistic.xlsm” attachment and opens it using

excel.exe and passes the above parameters into Excel. The Logistic.xlsm spreadsheet is pre-programmed to take the input

arguments and construct the visualization by querying the model attributes from the database.

Page 19: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 19 © 2014 Fuzzy Logix LLC

5. Twitter Buzz Prediction (Linear Regression)The “Twitter Buzz Prediction” project provides a sample implementation that can be used to predict the magnitude of

twitter buzz based on several data points such as number of discussions on a topic, number of authors, number of

interactions between the authors, etc. You can explore this project by loading the provided “Twitter Buzz Prediction.bin”

file into your ADS Generator environment.

With the help of this project, we will be reviewing the integration between ADS Generator, DB LytixTM and Excel.

Follow the steps below for importing and exercising this analysis.

Page 20: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 20 © 2014 Fuzzy Logix LLC

5.1 Import the Demo Project "Twitter Buzz Prediction"

Page 21: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 21 © 2014 Fuzzy Logix LLC

5.2 Map Your Input Database "FL_Train"

Your input database will be the database where you have loaded the Demo data provided as part of your DB LytixTM

installation. In our case, the input database is “FL_Train”.

Page 22: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 22 © 2014 Fuzzy Logix LLC

5.3 Open "Twitter Buzz Prediction" Project

Once you have imported the project, you should see the following screen:

Note the following artifacts in the project explorer window on the right:

“TwitterBuzz” Analysis – This is the analysis that organizes the various steps in this project.

We will review this in detail below.

“LinearRegr.xlsm” Excel spreadsheet attachment – This is the Excel template provided by Fuzzy Logix for visualizing

the results of a linear regression analysis.

Page 23: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 23 © 2014 Fuzzy Logix LLC

“Twitter Buzz Prediction” DataFor this analysis, we will be using the input table tblTwitterBuzz. This table comes pre-loaded with the Demo database for

DB LytixTM.

tblTwitterBuzz

Table Column Datatype Table Column Datatype

OBSID INTEGER Attention_Level_Contributors_4 FLOAT

Created_Discussions_0 INTEGER Attention_Level_Contributors_5 FLOAT

Created_Discussions_1 INTEGER Attention_Level_Contributors_6 FLOAT

Created_Discussions_2 INTEGER Contribution_Sparseness_0 FLOAT

Created_Discussions_3 INTEGER Contribution_Sparseness_1 FLOAT

Created_Discussions_4 INTEGER Contribution_Sparseness_2 FLOAT

Created_Discussions_5 INTEGER Contribution_Sparseness_3 FLOAT

Created_Discussions_6 INTEGER Contribution_Sparseness_4 FLOAT

Author_Increase_0 INTEGER Contribution_Sparseness_5 FLOAT

Author_Increase_1 INTEGER Contribution_Sparseness_6 FLOAT

Author_Increase_2 INTEGER Author_Interaction_0 FLOAT

Author_Increase_3 INTEGER Author_Interaction_1 FLOAT

Author_Increase_4 INTEGER Author_Interaction_2 FLOAT

Author_Increase_5 INTEGER Author_Interaction_3 FLOAT

Author_Increase_6 INTEGER Author_Interaction_4 FLOAT

Attention_Level_Authors_0 FLOAT Author_Interaction_5 FLOAT

Attention_Level_Authors_1 FLOAT Author_Interaction_6 FLOAT

Attention_Level_Authors_2 FLOAT Num_Authors_0 INTEGER

Attention_Level_Authors_3 FLOAT Num_Authors_1 INTEGER

Attention_Level_Authors_4 FLOAT Num_Authors_2 INTEGER

Attention_Level_Authors_5 FLOAT Num_Authors_3 INTEGER

Attention_Level_Authors_6 FLOAT Num_Authors_4 INTEGER

Brustiness_Level_0 FLOAT Num_Authors_5 INTEGER

Brustiness_Level_1 FLOAT Num_Authors_6 INTEGER

Brustiness_Level_2 FLOAT Avg_Length_Discussion_0 FLOAT

Brustiness_Level_3 FLOAT Avg_Length_Discussion_1 FLOAT

Brustiness_Level_4 FLOAT Avg_Length_Discussion_2 FLOAT

Brustiness_Level_5 FLOAT Avg_Length_Discussion_3 FLOAT

Brustiness_Level_6 FLOAT Avg_Length_Discussion_4 FLOAT

Atomic_Containers_0 INTEGER Avg_Length_Discussion_5 FLOAT

Atomic_Containers_1 INTEGER Avg_Length_Discussion_5 FLOAT

Atomic_Containers_2 INTEGER Avg_Length_Discussion_6 FLOAT

Atomic_Containers_3 INTEGER Avg_Num_Discussions_0 INTEGER

Atomic_Containers_4 INTEGER Avg_Num_Discussions_1 INTEGER

Atomic_Containers_5 INTEGER Avg_Num_Discussions_2 INTEGER

Atomic_Containers_6 INTEGER Avg_Num_Discussions_3 INTEGER

Attention_Level_Contributors_0 FLOAT Avg_Num_Discussions_4 INTEGER

Attention_Level_Contributors_1 FLOAT Avg_Num_Discussions_5 INTEGER

Page 24: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 24 © 2014 Fuzzy Logix LLC

Attention_Level_Contributors_2 FLOAT Avg_Num_Discussions_6 INTEGER

Attention_Level_Contributors_3 FLOAT Buzz_Magnitude FLOAT

This table has 583,250 records. The attribute “Buzz_Magnitude” is will be the dependent variable. Following is a brief

description of the independent attributes. Each of the attributes value is captured at 6 distinct times…represented by the

_0 through _6 suffixes.

Attribute Name Description

Attention Level

(measured with number of authors)

This feature is a measure of the attention payed to a the

instance's topic on a social media.

Burstiness LevelThe burstiness level for a topic z at a time t is defined as the

ratio of ncd and nad.

Number of Atomic Containers

This feature measures the total number of atomic containers

generated through the whole social media on the instance's

topic until time t.

Attention Level

(measured with number of contributions)

This feature is a measure of the attention payed to a the

instance's topic on a social media.

Contribution SparsenessThis feature is a measure of spreading of contributions over

discussion for the instance's topic at time t.

Author InteractionThis feature measures the average number of authors

interacting on the instance's topic within a discussion.

Number of AuthorsThis feature measures the number of authors interacting on

the instance's topic at time t.

Average Discussions LengthThis feature directly measures the average length of a

discussion belonging to the instance's topic.

Average Number of DiscussionsThis features measures the number of discussions involving the

instance's topic until time t.

“Twitter Buzz Prediction” Analysis

In the above figure, you see the ADS Generator analysis "TwitterBuzz". This analysis is a work flow that goes from the input

data to model building to result visualization. We outline each of the steps below in detail:

Page 25: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 25 © 2014 Fuzzy Logix LLC

1. We start by setting the current database. This identifies the database to which the results from this analysis will be

persisted.

2. Then we prepare the training data. This step includes the transformation of the data to “deep form” as

required by DB LytixTM functions.

This step also:

i. Removes correlated data.

ii. Makes the data sparse (There is no need to store missing values or zeros).

iii. Converts non-numeric data and categorical values to numeric values.

3. Run the Linear Regression.

4. We score the observations and persist the scores for subsequent analysis and visualization.

Page 26: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 26 © 2014 Fuzzy Logix LLC

5. In order to evaluate the quality of the model, we compute some standard metrics:

i. Residual Analysis

ii. Histogram

6. And then, we request the visualization of the results. The results are visualized using the excel template

(LinearRegr.xlsm) that is attached to the project. As the analysis executes, you will see the steps being

highlighted. Once the execution is complete, Excel will come into focus with the results of the analysis.

i. Statistics Sheet – This is a summary of the linear regression model and related statistics. You can observe

the model statistics, coefficient estimates and analysis of variance here.

Page 27: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 27 © 2014 Fuzzy Logix LLC

ii. Residual Plot – The residual plot shows the residuals on the vertical axis and the independent variable

on the horizontal axis. Since the points in this residual plot are randomly dispersed around the

horizontal axis, linear regression model is appropriate for this data.

iii. Actual-vs-Predicted Sheet – This chart shows the relative positioning of the actual values vs. the

predicted values. The symmetric distribution of the points around the diagonal line indicates linearity,

which indicates a good model fit for this data.

Page 28: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 28 © 2014 Fuzzy Logix LLC

iv. Distribution of Error

Page 29: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 29 © 2014 Fuzzy Logix LLC

5.4 A Closer Look at Excel Integration

The integration with Excel is achieved using the capability in ADS Generator to invoke external commands/programs.

This is done by adding a Run Unit of the type “External Program/Script (any)”.

In the properties for the “External Program/Script (any)”, we specify the program to run (excel.exe) and the parameters for

that program:

The “Arguments” are described below:

Page 30: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 30 © 2014 Fuzzy Logix LLC

The actual parameter values are passed in at runtime through the following associations:

When this Run unit is executed, the ADS generator automatically extracts the “LinearRegr.xlsm” attachment and opens it

using excel.exe and passes the above parameters into Excel. The LinearRegr.xlsm spreadsheet is pre-programmed to take

the input arguments and construct the visualization by querying the model attributes from the database.

Page 31: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 31 © 2014 Fuzzy Logix LLC

6. Country Category PredictionThe “Country Category Prediction” project provides a sample implementation that can be used to predict the magnitude of

Country Category based on several data points such as number of discussions on a topic, number of authors, number of

interactions between the authors, etc. You can explore this project by loading the provided “Country Category

Prediction.bin” file into your ADS Generator environment.

With the help of this project, we will be reviewing the integration between ADS Generator, DB LytixTM and Excel.

Follow the steps below for importing and exercising this analysis.

Page 32: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 32 © 2014 Fuzzy Logix LLC

6.1 Import the Demo Project "Country Category Prediction"

Page 33: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 33 © 2014 Fuzzy Logix LLC

6.2 Map Your Input Database "FL_Train"

Your input database will be the database where you have loaded the Demo data provided as part of your DB LytixTM

installation. In our case, the input database is “FL_Train”.

Page 34: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 34 © 2014 Fuzzy Logix LLC

6.3 Open "Country Category Prediction" Project

Once you have imported the project, you should see the following screen:

Note the following artifacts in the project explorer window on the right:

“CountryCategoryPredict” Analysis – This is the analysis that organizes the various steps in this project. We will review

this in detail below.

“DecisionTree.xlsm” Excel spreadsheet attachment – This is the Excel template provided by Fuzzy Logix for visualizing

the results of a linear regression analysis.

Page 35: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 35 © 2014 Fuzzy Logix LLC

“Country Category Prediction” DataFor this analysis, we will be using the input table tblCountryGDP2010. This table comes pre-loaded with the Demo database

for DB LytixTM.

Table tblCountryGDP2010

Table Column Datatype

id INTEGER

countryname VARCHAR

developed INTEGER

yr INTEGER

region VARCHAR

gdpgrowth FLOAT

inflation FLOAT

gdfp FLOAT

This table has 171 records. The attribute “developed” is will be the variable we will be predicting based on the other

attributes of each country.

“Country Category Prediction” Analysis

In the above figure, you see the ADS Generator analysis “CountryCategoryPredict”. This analysis is a workflow that goes from

the input data to model building to result visualization. We outline each of the steps below in detail:

1. We start by setting the current database. This identifies the database to which the results from this analysis will be

persisted.

Page 36: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 36 © 2014 Fuzzy Logix LLC

2. Then we prepare the training data. This step includes the transformation of the data to “deep form” as required by

DB LytixTM functions.

This step also:

i. Removes correlated data.

ii. Makes the data sparse (There is no need to store missing values or zeros).

iii. Converts non-numeric data and categorical values to numeric values.

3. Train the decision tree.

4. We score the observations and persist the scores for subsequent analysis and visualization.

5. We compute the confusion matrix in order to help us evaluate the quality of the model.

Page 37: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 37 © 2014 Fuzzy Logix LLC

6. And then, we request the visualization of the results. The results are visualized using the excel template

(DecisionTree.xlsm) that is attached to the project. As the analysis executes, you will see the steps being

highlighted. Once the execution is complete, Excel will come into focus with the results of the analysis.

i. Model Statistics

ii. Confusion Matrix

iii. Tree representation of the Decision Tree model built

Page 38: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 38 © 2014 Fuzzy Logix LLC

6.4 A Closer Look at Excel Integration

The integration with Excel is achieved using the capability in ADS Generator to invoke external commands/programs.

This is done by adding a Run Unit of the type “External Program/Script (any)”.

In the properties for the “External Program/Script (any)”, we specify the program to run (excel.exe) and the parameters for

that program:

The “Arguments” are described below:

Page 39: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 39 © 2014 Fuzzy Logix LLC

The actual parameter values are passed in at runtime through the following associations:

When this Run unit is executed, the ADS generator automatically extracts the “DecisionTree.xlsm” attachment and opens it

using excel.exe and passes the above parameters into Excel. The DecisionTree.xlsm spreadsheet is pre-programmed to take

the input arguments and construct the visualization by querying the model attributes from the database.

Page 40: User Guide for Integrating DB Lytix™ and Excel - Teradata

User Guide for Integrating DB Lytix™ and Excel 40 © 2014 Fuzzy Logix LLC