091104-M Hornick-Slides-Oracle Data Mining Case Study

60

Transcript of 091104-M Hornick-Slides-Oracle Data Mining Case Study

Page 1: 091104-M Hornick-Slides-Oracle Data Mining Case Study
Page 2: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Oracle Data Mining for Text, Clustering, and Classification:

Case Study of a Recommendation Engine

Mark Hornick Pablo Tamayo

Senior Manager, Development Consulting MTS

[email protected] [email protected]

Data Mining Technologies Group

Page 3: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Introduction

Recommendation Engine at

Oracle OpenWorld Conference

2008

2009

Recommend conference sessions to attendees

Enhance session enrollment application

Use Oracle Data Mining and Oracle Data Miner UI

K-means, Naïve Bayes, Text Mining, Code Generation

Page 4: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 5: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 6: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

High Level Objectives

Help attendees find relevant sessions

Maximize individual OOW experience / value

Increase session attendance

Page 7: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Technical Objectives and Constraints

Recommend 2009 sessions before

any history of who registered for any 2009 sessions

Use no session ratings data from attendees

Recommend sessions by relative preference

Recommend exhibitors and demos for attendees

Identify top N related sessions to a given session

Use an automated data mining-based solution

Page 8: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Approach

DeductionQuery refinement

Users specify what they want to retrieve

InductionModel-based recommendation engine

Recommend sessions most relevant to attendee profile

Improve likelihood of finding sessions of interest

…enhance Schedule Builder tool with Oracle Data Mining-generated session recommendations

Page 9: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Enrollment Application – Schedule Builder

Page 10: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Oracle Data Mining

Automatically sifts through data to

find hidden patterns, discover new insights,

and make predictions

Wide range of capabilities

Predict customer behavior (Classification)

Predict or estimate a value (Regression)

Group similar documents (Clustering and Text Mining)

Identify factors that determine an outcome (Attribute Importance)

Find profiles of targeted people or items (Decision Trees)

Determine important relationships and “market baskets” (Associations)

Extract higher-level text features (Feature Extraction)

Find fraud or “rare events” (Anomaly Detection)

…and others

Oracle Data Miner user interface supporting guided analytics

Page 11: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Approach – 30,000 ft.

2008 Data- Sessions- Attendees- Attendance

Model

Build

Apply

2009 Data- Sessions- Attendees

New attendee registersand completes survey

Ranked SessionRecommendationsfor each Attendee

Page 12: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Approach – 30,000 ft.

2009 Sessionrecommendationsfiltered by usercriteriaAttendee logs into

Schedule Builder

Ranked Sessionsretrieved

Ranked SessionRecommendations

for Attendees

Page 13: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Success Metrics

Conversion rate

% attendees who used at least 1 recommendation

Enrollment vs. actual attendance

Test Metrics

Enrichment curve

Global measure of merrit

Page 14: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 15: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Conference Session

Recommendation Problem

Sessions are single use

No two are exactly alike conference to conference

Sessions have no history and no future

Don‟t know who will attend a given session

until after the session

No rating information available, attendance only

Infer preferences using higher level projections

Session themes

Attendee profiles

Page 16: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Conference DataOOW ‟08

Sessions (1850+)

Title, abstract, track(s)

Attendees (41700+)

Survey questions, position, product usage

Attendance (206700+)

Who attended which sessions

Page 17: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Attendee Interestsfrom OOW‟08 registration survey

Applications

Fusion

Agile

EBS

Hyperion

PeopleSoft

Siebel

JD Edwards

On Demand

App Integration Architecture

Development and Management

Technology

Business Intelligence

Security

SOA, BPM, Web Services, App Server

Content Management, Collaboration, Web 2.0

Predictive Analytics, Data Mining

Database

Enterprise Management

Identity Management

Warehousing

Performance / Scalability, GRID / RAC

High Availability

Middleware

Product Area

Customer Relationship Management

Governance, Risk, and Compliance

Master Data Management

Fulfillment (order management / logistics)

Supply Chain Management / Planning

Human Capital Management

Procurement

Project Management

Business Intelligence

Development

.Net

Database

Java

Fusion Development

Service-Oriented Architecture

Tools Development and Management

Product Lifecycle Management

Asset Lifecycle Management

Enterprise Performance Management

Financial Management

Strategy

Oracle Services

Oracle Consulting

Oracle Support

Oracle University

Oracle Linux Support

Automotive

Chemicals

Communications

Consumer Good

Natural Resources

Oil and Gas

Professional Services

Public Sector

Retail

Travel and Transportation

Education and Research

Engineering, Construction and Real Estate

Financial Services

Healthcare

High Tech

Industrial Manufacturing

Life Sciences

Media and Entertainment

Industry

…and others

Oracle Advanced Customer Services

Oracle On Demand

BEA

Primavera

Page 18: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Data Preparation

Sessions

Concatenate relevant columns to facilitate text mining

Attendance

Remove duplicates

Attendees

Synonyms in attribute values, e.g., state = OH and Ohio

Incomplete data, e.g., region = null

Multi-valued attributes requiring parsing,

e.g., member of user groups separated by „;‟ or „/‟

Map data columns between 2008 and 2009

e.g., Advanced customer services split between Apps and Tech

Free form columns, e.g., job title = Vice President, V.P., VP

Page 19: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Free Form FieldsJob Title Example

create table ATTENDEE09_PREP as

case when a.job_title like ''%Manager%'' then 1 else 0 end job_title_manager,

case when a.job_title like ''%President%'' then 1 else 0 end job_title_president,

case when a.job_title like ''%Vice%'„ then 1 else 0 end job_title_vice,

case when a.job_title like ''%V.P.%'„ then 1 else 0 end job_title_president,

case when a.job_title like ''%V.P.%'' then 1 else 0 end job_title_vice,

from ATTENDEE09

Page 20: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 21: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Methodology

2008 Sessions2008 Attendees

Build classification

model to predictclusters for

attendees, thenscore attendees for each cluster

ClusterSessions

2008 Attendees 2008 Session Clusters(themes)

x =.86

.73

.66

Vector multiply eachattendee‟s clusterscores against each session‟s clusterscores for totalorder ranking of recommendations

New 2009 AttendeeCluster Scores

Vector

New 2009 SessionsCluster Scores

Vectors

RankedSessionRec‟s

New 2009 Attendees New 2009 Sessions

Page 22: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Model Building and Scoring Details

Cluster sessions

Concatenate all session-related text

Text Mining data preparation – create text index

Lexer with stemming

Custom “stopword” list

Page 23: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Session S291749

Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.

Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0

Title: Integrating Oracle Accounts Payable with OracleImaging and Process Management

1. Perform Stemming (example)

integrate account

processdevelop

integrate

invoice

accountutilize

Page 24: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Session S291749

Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.

Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0

Title: Integrating Oracle Accounts Payable with OracleImaging and Process Management

1. Perform stemming (example)

2. Remove stopwords

X

XX

X

X

X X XX XXX

XX X

X

XX

XX X XXXX

X X

integrate account

processdevelop

integrate

invoice

accountutilize

Page 25: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Creating a Text Index, Stoplist, Lexer

Using Oracle Text

CREATE INDEX session09_txt_idx

ON session09_txt (session_txt)

INDEXTYPE IS CTXSYS.CONTEXT

PARAMETERS

('LEXER OOW_LEXER

STOPLIST OOW_STOPLIST');

ctx_ddl.create_preference('oow_lexer', 'BASIC_LEXER');

ctx_ddl.set_attribute('oow_lexer','index_stems','ENGLISH');

ctx_ddl.set_attribute('oow_lexer','index_text','true');

ctx_ddl.create_stoplist('oow_stoplist', 'BASIC_STOPLIST');

ctx_ddl.add_stopword('oow_stoplist', 'your'); /*…*/

ctx_ddl.add_stopword('oow_stoplist', 'oracle');

Page 26: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Session Term Scores Example

Integrate .23

Account .04

Payable .26

Imaging .62

Process .09

Management .05

Technology .17

Content .08

Collaboration .43

Page 27: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

TF-IDF

(term-frequency – inverse document frequency)

Statistical measure evaluates importance of

a given word to a document in a corpus

Word importance increases proportionally to

the number of times a word appears in

document, but offset by frequency of word

in corpus

Page 28: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

TF-IDF Example One way to compute

Consider

A session, S1, title and abstract containing 100 words

Word „mining‟ appears 6 times in S1

Term frequency (TF) for „mining‟ in S1 is 6/100, or 0.06

Of 1850 sessions, say 25 contain the word „mining‟

Inverse document frequency is calculated as

ln(1850 / 25) = 4.3

TF-IDF score for „mining‟ in S1 is 0.06 * 4.3, or 0.26

Page 29: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Session Term Scores Example

Specify the maximum

number of terms

to represent entire corpus

to represent the document

Integrate .23

Account .04

Payable .26

Imaging .62

Process .09

Management .05

Technology .17

Content .08

Collaboration .43

Page 30: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Model Building and Scoring Details

Cluster sessions

Concatenate all session-related text

Text Mining data prep – create text index

Lexer with stemming

Custom stop word list

1000 max terms in corpus

30 max terms per document

Build k-Means model with 20 clusters (themes)

Score 2008 and 2009 sessions to identify theme probabilities

Page 31: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Clustering Results for 2008 Sessions

Theme (Cluster Name) ClusterID Count

INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 103

DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 94

CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 82

PLM-AGILE-PRODUCT-CONTACT-CENTER 23 53

SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 127

INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 148

DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 112

RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 92

ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 66

CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 77

CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 125

HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 62

SOA-BPM-SERVER-APPLICATION-FUSION 32 121

MEETING-SIG-IOUG-DATABASE-APPLICATION 33 33

EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 95

JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 52

TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 76

SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 80

12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 80

OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 69

Page 32: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Model Building and Scoring Details

Classify attendee interests in themes

Build Naïve Bayes model using 2008 attendees

Predict 2009 attendee interest in each of the 20 themes

New 2009 Attendees

Page 33: 091104-M Hornick-Slides-Oracle Data Mining Case Study

“Joe the DBA”

DB_REL_ODB_10G 1

DEV_EN_TEXT_EDITOR 1

DEV_EN_VI 1

GEOGRAPHIC_REGION Americas

INDUSTRY Aerospace

ORACLE_PARTNER Yes

JOB_TITLE_DBA 1

JOB_TITLE_SENIOR 1

ATTEND_ID

COMPANY_REVENUE

DB_REL_ODB_10G

DB_REL_ODB_8I

DB_REL_ODB_9I

DEV_EN_11G_PREVIEW

DEV_EN_BORLAND_JBUILDER

DEV_EN_ECLIPSE

DEV_EN_MS_DOT_NET

DEV_EN_MS_VISUAL_STUDIO

DEV_EN_ORA_APPS_EXPRES

DEV_EN_ORA_FORMS

DEV_EN_ORA_JDEV_10G

DEV_EN_ORA_SQL_DEV

DEV_EN_OTHER

DEV_EN_OTHER_JAVA_IDE

DEV_EN_SQL_EDITORS

DEV_EN_TEXT_EDITOR

DEV_EN_TOAD

DEV_EN_VI

GEOGRAPHIC_REGION

INDUSTRY

ORACLE_PARTNER

ORA_EBS

ORA_JDE

ORA_PS

ORA_SIEBEL

PROFIT_MAGAZINE_SUBSCRIPTION

UG_MEM_APOUC

UG_MEM_EOUC

UG_MEM_HEUG

UG_MEM_IOUG

UG_MEM_OAUG

UG_MEM_ODTUG

UG_MEM_OHUG

UG_MEM_QIUG

UG_INFO_APOUC

UG_INFO_EOUC

UG_INFO_HEUG

UG_INFO_IOUG

UG_INFO_OAUG

UG_INFO_ODTUG

UG_INFO_OHUG

UG_INFO_QIUG

UG_INFO_DO_NOT_SEND_ORA_INFO

JOB_TITLE_MANAGER

JOB_TITLE_PARTNER

JOB_TITLE_PROJECT_LEAD

JOB_TITLE_MARKETING

JOB_TITLE_PRESIDENT

JOB_TITLE_VICE

JOB_TITLE_DIRECTOR

JOB_TITLE_ARCHITECT

JOB_TITLE_ANALYST

JOB_TITLE_DBA

JOB_TITLE_DEVELOPER

JOB_TITLE_SALES

JOB_TITLE_PROD_MGR

JOB_TITLE_CHIEF_OFFICER

JOB_TITLE_CONSULTANT

JOB_TITLE_SENIOR

JOB_TITLE_STUDENT

Theme (Cluster Name) ClusterID Probability

INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0005

DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.3997

CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.0002

PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0005

SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0005

INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.2190

DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.4245

RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.3010

ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0502

CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0009

CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0098

HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0031

SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0000

MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0038

EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0031

JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0260

TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0188

SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0278

12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0075

OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0994

Att

en

de

eA

ttri

bu

tes

Predict themes(clusters) for “Joe”

Page 34: 091104-M Hornick-Slides-Oracle Data Mining Case Study

How Does This Session Rank for Joe?

Abstract: In this session, learn how to integrate Oracle Imaging and Process Management with your Oracle Financials Accounts Payable system by utilizing Oracle Imaging and Process Management and Oracle BPEL Process Manager. See how a paperless, Web-based solution was developed to automate the processing of invoices.

Track Type: TECHNOLOGY; Content Management, Collaboration and Web 2.0; Content Management, Collaboration and Web 2.0

Title: Integrating Oracle Accounts Payable with OracleImaging and Process Management

Page 35: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Cluster Probabilities for Session S291749

Theme (Cluster Name) ClusterID Probability

INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0023

DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.0021

CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.9534

PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0020

SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0020

INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.0027

DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.0018

RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.0032

ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0018

CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0022

CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0026

HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0049

SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0037

MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0015

EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0016

JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0016

TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0027

SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0022

12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0037

OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0019

Page 36: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Computing this Session‟s Score

Specifically for Joe…

Theme (Cluster Name) ClusterID

Joe's Cluster

Probability

Session

S291749 Cluster

Probability Product

INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0005 0.0023 0.000001

DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.3997 0.0021 0.000848

CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.0002 0.9534 0.000216

PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0005 0.0020 0.000001

SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0005 0.0020 0.000001

INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.2190 0.0027 0.000587

DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.4245 0.0018 0.000780

RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.3010 0.0032 0.000960

ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0502 0.0018 0.000088

CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0009 0.0022 0.000002

CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0098 0.0026 0.000025

HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0031 0.0049 0.000015

SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0000 0.0037 0.000000

MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0038 0.0015 0.000006

EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0031 0.0016 0.000005

JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0260 0.0016 0.000041

TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0188 0.0027 0.000051

SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0278 0.0022 0.000062

12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0075 0.0037 0.000028

OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0994 0.0019 0.000191

SCORE: 0.003908

x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =x =

Page 37: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Recommendation Score Query

select attend_id, session_id, score

from (

select a.attend_id, s.session_id,

sum(a.probability * s.probability) score

from SESSION_TXT09_SCORES_T20 s,

ATTENDEE09_SCORES_T20) a

where a.prediction= s.cluster_id

group by a.attend_id, s.session_id

)

order by attend_id, score desc

Pro

ba

bil

ity

Se

ss

ion

1

Se

ss

ion

N…

Page 38: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 39: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Evaluating RecommendationsProducing Training (Build) and Test Datasets

„08 Session Data

‟08

Att

en

de

e D

ata

Build the models

using these datasets

Test themodels

using these datasets

Typical space for recommendations: Recommend same sessions to new attendees

Projection Mining Space: Recommend new sessions to new attendees

Bu

ild

Test

Build Test

Cross-sell / Up-sell Space: Recommend new sessions to same attendees

Page 40: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Evaluating Results:

Session Recommendation CurveModel scores as a function of rank

Linear behavior of recommendations

Threshold separating high from low confidence recommendations

Represents the location of “hits” (attendee attended session)

Dot == Scored Session

Page 41: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Enrichment CurveRunning calculation where enrichment is

maximum deviation from 0

Represents the location of “hits”

Point of maximumenrichment

Recom

mendation

Enri

chm

ent S

core

Page 42: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Model-ranked sessions Model-ranked sessions

Model score

Model score

Model score

Model-ranked sessions Model-ranked sessions

Model-ranked sessions Model-ranked sessions

Attendee W1134872 NE = 1.07 Lift = 1.55 ROC = 0.51

Attendee W1144260 NE = 1.63 Lift = 2.47 ROC = 0.71

Attendee W1152645 NE = 2.88 Lift = 3.07 ROC = 0.79

Model-ra

nked s

essio

ns

Model-ra

nked s

essio

ns

Model-ra

nked s

essio

ns

Model-ranked sessions Model-ranked sessions

Model score

Model score

Model score

Model-ranked sessions Model-ranked sessions

Model-ranked sessions Model-ranked sessions

Attendee W1134872 NE = 1.07 Lift = 1.55 ROC = 0.51

Attendee W1144260 NE = 1.63 Lift = 2.47 ROC = 0.71

Attendee W1152645 NE = 2.88 Lift = 3.07 ROC = 0.79

Model-ra

nked s

essio

ns

Model-ra

nked s

essio

ns

Model-ra

nked s

essio

ns

Page 43: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Global Measure of Merit

PM Model

Random Model

PM Model

Random Model

NE

P(N

E)

PM Model

Random Model

PM Model

Random Model

NE

P(N

E)

Normalized Enrichment

Random recommendations obtain an enrichment score of 1

Page 44: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 45: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Recommending Exhibitors and Demos

Page 46: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Recommending Exhibitors and Demos

Use clustering model from session data

Score exhibitors and demo text against 20 themes

Use existing attendee theme scores to compute

recommendation scores for each exhibitor and demo

New 2009 Attendees 2009 Exhibitors and Demos

Page 47: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Computing Related Sessions

Page 48: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Computing Related Sessions

Data preparation

Focus on tracks, tags, categories

Tokenize targeted terms from title and abstract fields

E.g., “Oracle Data Mining” “OracleDataMining”

Cluster sessions into 200 clusters using K-Means

Multiply cluster score vectors for similarity score

Page 49: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Computing Related Sessions

x =.95

.81

.67

Vector multiply eachsession‟s clusterscores against all other sessions‟ clusterscores for totalorder ranking of related sessions

2009 SessionCluster Scores

Vector

Other 2009 SessionsCluster Scores

Vectors

RankedRelatedSessions

2009 Sessions

……

2009 Themes(200 clusters)

2009 Sessions

ClusterSessions

2009 Themes (200 clusters)

…Score each sessionagainst each theme (cluster)

Page 50: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 51: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Agenda

Recommendation engine scenario

Overview

Technical problem and data

Methodology for OOW ‟08 and „09

Evaluating recommendation quality

New features for OOW „09

Demonstration

OOW‟08 results and summary

Page 52: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

OOW‟08 Recommendation Engine Results

Distinct Schedule Builder visitors: 15667

Distinct visitors signup: 3266

Distinct visitors attended: 1775

Signup conversion rate: 20.3% (3266 / 15667)

Attended conversion rate: 11.3% (1775 / 15667)

Conversion ratepercentage of attendees who used at least 1 recommendation

Page 53: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Conversion Rates in other Domains

OOW Attended Sessions 11.3

OOW Signup Sessions 20.3

Circa 2004

Page 54: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

OOW‟08 Recommendation Engine ResultsDetail

Recommendations Signup

1768 attendees (11.3%) selected exactly 1

820 (5.2%) selected 2 recommendations

678 attendees (4.3%) selected 3 or more

32 attendees selected between 8 and 10

Actually Attended

1246 attendees (8%) attended exactly 1

382 (2.4%) attended 2 recommended sessions

147 attendees (0.9%) attended 3 or more

23 attendees attended between 5 and 9

Recommendations: Selected vs.

Attended

0

500

1000

1500

2000

Exactly 1 Exactly 2 More

than 3

Selected Count

Attended Count

Page 55: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

Summary

Oracle Data Mining provides a robust platform for Text Mining and building a Recommendation Engine

Oracle Data Mining with Oracle Data Miner code generation facilitated deployment of mining solution

Recommendation evaluation techniques show the models were able to predict sessions of interest

OOW conversion rates show that session recommendations were perceived useful to attendees

Page 56: 091104-M Hornick-Slides-Oracle Data Mining Case Study

For More Information

search.oracle.com

or

oracle.com

www.oracle.com/technology/products/bi/odm/index.html

Oracle Data Mining

Page 57: 091104-M Hornick-Slides-Oracle Data Mining Case Study
Page 58: 091104-M Hornick-Slides-Oracle Data Mining Case Study
Page 59: 091104-M Hornick-Slides-Oracle Data Mining Case Study
Page 60: 091104-M Hornick-Slides-Oracle Data Mining Case Study

Copyright © 2009 Oracle Corporation

The preceding is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

The development, release, and timing of any

features or functionality described for Oracle‟s

products remains at the sole discretion of Oracle.