Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.

18
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst

Transcript of Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.

Beyond Opportunity;

Enterprise MinerRonalda Koster, Data Analyst

Agenda

Introduction

SAS EM at Dalhousie University

Exploring SAS EM

Discussion

Introduction

Teaching Assistant with Dalhousie University

Analyst, Precision BioLogic Inc. Consultant

Informatics at Dalhousie

Informatics The study of the application of computer

and statistical techniques to the management of information -HGSC glossary

Dalhousie University First marketing informatics MBA major in

North America The first to use SAS EM for teaching

purposes Health Informatics program New Bachelor of Informatics Success story

Other courses required for Informatics major Multivariate statistics Direct marketing Marketing research Marketing strategy Database design Internet marketing

Our students Work for:

Small consulting companies Large financial institutions Not for profit organizations Telecommunications companies Insurance companies Hospitals Loyalty program companies Travel companies Oil and gas industry Publishing houses A common thing is – they all work with

information

SEMMA Process Sample

Input, partition and sample data Explore

View distributions and associations Modify

Transform data, filter outliers, cluster to derive new variables

Model Develop models i.e. Decision tree’s and

Regression Access

Assess models

Business Problem

Have you ever wanted to understanding things that occur together or in sequence? Market Basket Analysis: Association

Node

Broad applications Basket data analysis, cross-marketing,

catalog design, campaign sales analysis

Web log (click stream) analysis, DNA sequence analysis, etc.

Associations Node

Support, probability that a transaction contains XY Frequency the combination occurs

Confidence, conditional probability that a transaction having X also contains Y Percentage of cases that Y occurs, given

that X has occurred

Sequential Association Y occurs some time period after X occurs

Associations Node If a customer purchases

Avocado, then 80% of the time they will purchase steak Confidence = 800 / 1,000 = 80% Support = 800 / 8,000 = 10%

Avocado Steak8,000 transactions1,000 Avocados2,000 Steak800 Avocados & Steak

antecedent consequent

Business Problem

Have you ever wanted to classify or segment data on the basis of similar attributes so that each segment or cluster differs from another and all objects within a cluster share traits? Segmentation: Clustering Node

Broad Applications Demographic / psychographic

segmentation, campaign segmentation etc.

Clustering Example

Identify similar objects or groups that are dissimilar from other clusters through disjoint cluster analysis on the basis of Euclidean distances

Profile clusters graphically within EM Use derived segments for further

analysis / algorithms (as an input variable or a target)

Customize clusters based on standardization method, clustering method and clustering criterion

Business Problem

Have you ever wanted to predict the likelihood of an event (and assign a cost to it)?

Decision tree Node Broad Applications

classify observations, predict outcomes based on decision alternatives.

Decision Tree Example A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class

distribution Handles missing data well Represent the knowledge in the form of IF-

THEN rules Decision tree generation consists of two

phases Tree construction

At start, all the training examples are at the root Partition examples recursively based on selected

attributes Tree pruning

Identify and remove branches that reflect noise or outliers

Business Problem

Have you ever wanted to ensure you target those most likely to purchase from a campaign whom you’ve never contacted previously?

Scoring Node Broad applications:

Testing model scalability, applying learning for subsequent events, etc.

Lessons learned Data cleansing and transformation takes most of

the time Data analysis done using EM – interpretable

results Data modeling techniques are very robust SAS EM works well with huge datasets Knowledge obtained is transferred easily Learning never stops – EM reference, tutorial

examples You can analyze almost any kind of data You can use SAS EM regardless the industry and

size of dataset You need: a good computer, SAS support, and

patience While not all students use SAS in their careers, the

analytical principles they learn are extremely useful for their careers