Inspire 2015 - Alteryx: Data Blending: Best Practices
-
Upload
alteryx -
Category
Data & Analytics
-
view
47 -
download
0
Transcript of Inspire 2015 - Alteryx: Data Blending: Best Practices
#inspire15
Data Blending: Best Practices
Tuesday, May 19, 2015
Ben Gomez, Senior Product Manager, AlteryxDr. Poornima Farrar, Product Manager, Alteryx
#inspire15
Agenda
• Develop Workflows Effectively• Evaluate the Data• Sample the Data
• Develop Clear Workflows• Rename Fields• Simplify the Process
• Develop Efficient Workflows• Sort Data Sparingly• Organize Data Sources• Process Near the Data
#inspire15
Effective Workflow Development
#inspire15
Effective Workflows
Evaluate the Data
• Data problems can slow down your workflow development or give you invalid results• Duplicate records• Missing values• Unexpected characters• Invalid values or ranges
#inspire15
Demo – Field Summary
#inspire15
Effective Workflows
Sample the Data
Sample limits the data stream to a number, percentage or random set of records.
Random % Sample generates a random number or percentage of records passing through the data stream.
Oversample Field samples incoming data to ensure equal representation of data values.
#inspire15
Clear Workflows
#inspire15
Clear Workflows
Rename fields
#inspire15
Clear Workflows
Simplify the Process
How would one parse an email address? [email protected]([^@]*)(@)([^\.]*)(.*)
#inspire15
Demo - Parsing
#inspire15
Demo - Data Macros
#inspire15
Efficient Workflows
#inspire15
Efficient Workflows
Sort Data Sparingly
• Sorting is an expensive operation.• Sorting is necessary for several operations.
• When sorting, the more data in each record, the longer the sort will take
• Alteryx holds onto a sort if possible.• Formula resets the sort.• Sorting by a new field resets the sort.
#inspire15
Demo - Sorting
#inspire15
Effecient Workflows
Gathering Data Sources
http://www.alteryx.com/technical-specifications#data-sources
#inspire15
Efficient Workflows
Configuring Data Sources Format Selection
Bulk Load
#inspire15
Efficient Workflows
Configuring Data Sources
#inspire15
Efficient Workflows
Processing Near the Data
• Private/Public Server• Amazon Redshift and S3• Marketo and Salesforce
#inspire15
Summary
• Evaluate and clean your data (Field Summary Tool)• Simplify your process when possible• Rename your fields• Control your sorts • Set data aside and rejoin it later• Best: Add a Record ID field early that can be used to rejoin records
later• More Advanced: Keep track of records and join by record position
• Create Input Macros• Keep your processing close to your data sources
THANK YOU!
#inspire15