Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST /...

22
Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference call now; if you have any difficulty, contact support@ quadstone .com . Starting in 15 minutes Starting in 10 minutes Starting in 5 minutes Starting in 2 minutes Starting now

description

© 2005 Quadstone Data Preparation in the Quadstone System V5 Presenter: Joshua Lewis, Quadstone Consultant Overview: Interactive data-preparation: sorting, aggregating, joining, and deriving Audience: Experienced Quadstone System users looking to undertake ad-hoc preparation of data Format: A live demo with slides for sign-posting Follow-up exercises in the form of a workbook and dataset Duration: 1 hour, including Q&A

Transcript of Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST /...

Page 1: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

Data Preparation in theQuadstone System Version 5

Thursday, 17th February 20057.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CETPlease join the teleconference call now; if you have any

difficulty, contact [email protected].

Starting in 15 minutesStarting in 10 minutesStarting in 5 minutesStarting in 2 minutesStarting now

Page 2: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

How to ask questions

• Return to WebEx Event Manager:• Use Q&A (not Chat):

• You can return to full-screen view:

Page 3: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Data Preparation in the Quadstone System V5

• Presenter: Joshua Lewis, Quadstone Consultant • Overview: Interactive data-preparation: sorting,

aggregating, joining, and deriving• Audience: Experienced Quadstone System users

looking to undertake ad-hoc preparation of data• Format:

• A live demo with slides for sign-posting• Follow-up exercises in the form of a workbook and dataset

• Duration: 1 hour, including Q&A

Page 4: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Interactive data preparation

• Wizards for each action• Right-click in Quadstone System

Explorer (QSE), or• Click on (or drag file to) Quadstone

System Shortcut Bar

• Choose parameters as appropriateBest practice: keep an audit trail

Page 5: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Transactiondata

Customerdata

To be filled

Analysis dataset

Customer IDs

A simple data preparation process

SORT SOR

TMEASUR

EJOIN

DERIVE

DERIVE

Page 6: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Join Fields adds fields from a secondary focus to a primary focus.

Pre-requisites:• Key fields in both foci, of the exact same datatype• The records in both foci must be sorted by the key

fields

• Subtly different from Import Fields in Decisionhouse

Joining foci in QSE

Page 7: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Field derivation

22/10/1998 09/05/199901/09/199604/11/199702/03/199522/01/1995...

NULLNULL49435272

...

112111

...

CustomerID StartDate Age Gender

163004187006188008190006268006 36004...

22/10/1998 09/05/199901/09/199604/11/199702/03/1995

NULLNULL494352

11211

CustomerID StartDate Age Gender MonthsTenure

163004187006188008190006268006

Derivation

103352153

Page 8: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Deriving new fields

• Requires:• A focus (need not be sorted)• A derivations (.tml) file containing TML

descriptions of the fields to be derived

• Create a new TML file via right-click in QSE, then right-click again to Edit

Best practice: Develop and debug the FDL expressions interactively in Decisionhouse, before embedding them within the TML file

Page 9: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Derivation syntax

create Tenure := countwholemonths(StartDate,today());create YoungMan := Age < 30 and Gender = 1;

• Creates one output record per input record• The first example counts the number of months since a

person became a customer; the second creates a flag to identify a specific segment of customers

• General syntax:create <fieldname> := <FDL expression> ;

• See online help: Field Derivation Language (FDL) reference

Page 10: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Sorting foci

• Required when combining foci and grouping records

• Sort will usually, but not always, be on customer ID

Best practice: check sort order first, to see if a sort is needed

Best practice: sort once, upstream (adding keys if needed), to minimize time-consuming re-sorting

Page 11: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Transaction measurement

05/02/199807/02/199808/02/199802/02/199805/03/199803/02/1998...

ATMATMSDSDSDSD...

30.0054.4283.8029.4940.0046.26...

CustomerID Date TransType Value

643400000064340000006434000000643400000164340000016434000002...

08/02/199805/03/199803/02/1998...

643400000064340000016434000002…

ATMSDSD...

56.0734.7546.26...

CustomerID MostRecentDate MostFreqTrans AverageValue

Rollup(aggregation)

Page 12: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Simple aggregation

• Requires:• A focus (sorted by the grouping key

field, e.g., CustomerID)• Selection of an appropriate key field (for

grouping records)• An aggregations (.tml) file containing

TML descriptions of the aggregationsIgnore the Functions and Statistics

optionsExample TML files are in the online

help and the ext/demo/dbc folder of your installation

Page 13: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Aggregation syntax

create NumberOfPurchases := count(); create ValueOfPurchases := sum(Amount);

• Processes each group of transaction records that share the same CustomerID (grouping key), to create one output record per CustomerID

• The first example counts the number of transactions for each customer; the second sums the values in the Amount field for each customer

• General syntax:create <fieldname> := <aggfn>( <arguments> );

• See online help: Transaction Measurement Language (TML) reference

Page 14: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Joining foci

ID AgeA 56B 23C 31

Customers.ftrID TotalVisitsA 3C 2D 4

Visits.ftr

Customers.ftr

ID Age TotalVisitsA 56 3B 23 NullC 31 2

Join on ID

Sorted Sorted

Page 15: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Combining datasets

Append fields: abut equal-length datasets

Join fields: match on common key(s)

Merge records: interleave using common key(s)

+ =

+ =

+ =

Page 16: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Metadata

• You can import metadata from a previous dataset using a template focus

• Includes all derivations, selections, binnings, interpretations and comments

• Allows re-use of metadata developed interactively in Decisionhouse

• You can import metadata in XML form• Allows metadata (e.g. a data dictionary) to be

maintained externally (converted to XML)• Currently supports comments only

Page 17: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

MEASURING

qsderiveqsmeasureqsmeasuretrack

Advanced data preparation

FOC

US

qsbuild

COMBINING

qsjoinqsappendfieldsqsmerge

IMPORTING

qsdbaccessqsimportdb

qsgenfddqsimportflat

qsimportfocus

REPORTING

qsdescribeqsauditqsdtsnapshotqsscsnapshotqsxtqsxt2spec[qsinfo]

EXPORTING

qsdbcreatetableqsdbinsertqsdbupdate

qsexportflat

MANAGING

qscopyqslinkqsmoveqsremove

FOC

US

TRANSFORMING

qssort qsrenamefieldsqsselect

ENHANCING

qsimportmetadataqsupdate[qsinterp][qsexportmetadata]

qstmlqssettings

• Advanced TML aggregation syntax (filter, split)• Data Build Commands• Data Build Manager

Page 18: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Where to find out more

• Quadstone System Help; for example:• Working with flat files, database tables, and foci• Transaction Measurement Language (TML) reference• Field Derivation Language (FDL) reference

• Quadstone System data-build command and TML reference• Examples of TML

• ext/demo/dbc folder of your installation• More example TML and data

• Quadstone System Support website: http://support.quadstone.com/

• Advanced Data Preparation training course: contact [email protected]

Page 19: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Questions and answers

Page 21: Data Preparation in the Quadstone System Version 5 Thursday, 17 th February 2005 7.30am PST / 10.30am EST / 3.30pm GMT / 16.30 CET Please join the teleconference.

© 2005 Quadstone

Upcoming webinars

See www.quadstone.com/training/webinars/.If there’s a webinar topic you’d like to see, please let us know via

[email protected].

Pragmatic Scorecarding

March 17, 2005 14:00 UK/Ireland, 15:00 Central European, 9am Eastern

Pragmatic Scorecarding

March 18, 2005 9am Pacific, 11am Central, 12noon Eastern, 5pm UK/Ireland

The Quadstone Portal

April 14, 2005 14:00 UK/Ireland, 15:00 Central European, 9am Eastern

The Quadstone Portal

April 15, 2005 9am Pacific, 11am Central, 12noon Eastern, 5pm UK/Ireland