Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

28
Excel – Not a Bad Data Mining Client At All Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com

Transcript of Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Page 1: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Excel – Not a Bad Data Mining Client At All

Allan MitchellSQL Server MVP

Konesans Limitedww.SQLIS.com

Page 2: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Who am I

• SQL Server MVP• SQL Server Consultant• Joint author on Wrox Professional SSIS book• Worked with SQL Server since version 6.5• www.SQLDTS.com and www.SQLIS.com

Page 3: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Today’s Schedule

• Mostly Demos• Data Mining Add-In for Excel 2007– Added XL Functions– Visualisation Methods

Page 4: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Today’s Schedule

• Added XL Functions - Not a lot of people know these exist– DMPREDICT– DMPREDICTTABLEROW– DMCONTENTQUERY

– Only exist after add-in installed

Page 5: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Today’s Schedule

• Visualisation Methods– Accuracy Charts– Classification Matrix– Profit Charts– Folding (X-Validation)– Calculator (if we get time)

Page 6: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Excel Functions

• DMPREDICT• Can take a variable number of arguments, the minimum being 3.• The first parameter is the Analysis Services connection to be used.

An empty string refers to the current (active) connection.• The second parameter is the name of the mining model that will

execute the prediction• The third parameter, is the requested predicted entity (predictable

column, in general, but could also be any prediction function)• The function may also take up to 32 pairs of arguments. Each such

pair contains the value and the name of an input (in this order, i.e. value followed by name).

Page 7: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Excel Functions

• DMPREDICTTABLEROW• The first parameter is the Analysis Services connection to be

used. An empty string refers the current (active) connection.• The second parameter is the name of the mining model that will

execute the prediction• The third parameter, is the requested predicted entity

(predictable column, in general, but could also be any prediction function)

• The fourth parameter is a range of cells to be passed as inputs• The fifth parameter (optional) is a comma-separated list of

column names to be used as names for the inputs

Page 8: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Excel Functions

• DMPREDICTTABLEROW• If range of cells is form XL List Object• Column Headers taken from List• 5th Parameter not necessary– Unless Column Name != Model Column Name

Page 9: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Excel Functions

• DMCONTENTQUERY• The first parameter is the Analysis Services connection to

be used. An empty string refers to the current (active) connection.

• The second parameter is the name of the mining model that will execute the prediction

• The third parameter, is the requested content column• The fourth parameter is a WHERE clause to be appended

to the content query

Page 10: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

DEMOData Mining Excel functions

Page 11: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Excel Add-In

• Great way of visualising Data Mining• Takes away some of the mystery• Easy to use• Some wizards• Freedom vs. flexibility

Page 12: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Accuracy Charts

• Compare 1-n models against– Another model– Best model– Thumb in the air model/no model/chance

Page 13: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Accuracy Charts

• Interpreting– How does a model compare with other models– What is the cumulative gain– Lift

• The real thing we want to see is.....– By how much do we beat the “chance” model

Page 14: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

DEMOAccuracy Charts

Page 15: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Classification Matrix

• What are we interested in– How well did my model predict outcomes– False Positive– False Negative– True Positive– True Negative

Page 16: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Classification Matrix

Predicted TRUE FALSE

Actual

TRUE True Positive False Negative (type 2 error)

FALSE False Positive (type 1 error) True Negative

Page 17: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Classification Matrix

• A misclassification is not always a bad thing• Consider– Predicted possibility of disease– Extra care/treatment given– Real result is “No disease”– Example of false positive– Is it such a bad thing?

Page 18: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

DEMOClassification Matrix

Page 19: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Profit Charts

• Closely follows lift/cumulative gain chart• Apply costs to efforts

Page 20: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Profit Charts

• Apply costs to– Initial/Fixed outlay– Cost per case– Return per case

• Target predictable column• Target Outcome• Count of cases to use

Page 21: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

DEMOProfit Chart

Page 22: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

X-Validation/Folding/Rotation Estimation

• Validates your model• Tests whether model generally applicable• Large variations in results between partitions– Model not generally applicable– May need tuning

Page 23: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

X-Validation/Folding/Rotation Estimation

• Stratified K-Fold Cross Validation• Creates K folds– Representative partitions

• Holds one partition out• Trains model with others• Tests with holdout partition• Repeat (different holdout/test partition)* K

Page 24: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

DEMOX-Validation/Folding/Rotation Estimation

Page 25: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Prediction Calculator

• Set costs and profits associated with– Getting the prediction right– Getting the prediction wrong

• See profit curves• See profit threshold scores• Pad for entering new data

Page 26: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Prediction Calculator

• Cloud Version available• Print version available for later data entry• Easy to use• Easy to understand

Page 27: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

DEMOPrediction Calculator

Page 28: Allan Mitchell SQL Server MVP Konesans Limited ww.SQLIS.com.

Thank you…[email protected]