Join us for Joins (The Joy in Joins!!)
Terrence MaasSoftware Engineer
# T C 1 8
Joanna ChenSoftware Engineer
Agenda
Joins – Why the hype?
Intro to Tableau Prep
Practical Join Cases:
1. Cleaning and Filtering
2. Outer joins – Capture ALL the Values!
3. Range joins – Visualizing Occupancy
4. Self Joins – Correlations and Compositions
Q&A
What’s so great about joins?
Why do we use joins?
Connect Data
• Fundamental purpose of joins.
• Data derives most of its value by connecting to other data.
Reshape Data
• Different kinds of joins enable different kinds of analysis.
Select Data
• Join Conditions provide a bar for your data to meet.
• Tableau Prep uses this to help you clean your data.
Terrence’s Restaurant
Terrence’s Restaurant
Tables:
Employees
Schedules
Orders
Menu Items
Questions:
• Am I collecting good quality data?
• When is my restaurant busiest?
• What menu items are most popular? What about pairings?
Tableau Prep!
Tableau Prep
Tableau’s newest product offering
Released in April 2018
Mission Statement
Tableau Prep empowers more people to get to analysis faster by helping them quickly and confidently combine, shape, and clean their data.
Tableau Prep
Tableau Prep
Connection Pane
Flow Pane
Steps in the Flow
Profile Pane
Changes Pane
Data Grid
Joins in Tableau Prep
Configuring Your Join in Prep
Join Clause(s) are used to describe the relation of column(s) between two tables.
Join types are used to control how to exclude or include rows from two tables, depending on the join clauses defined above.
The Summary of Join Results shows you the distribution of values that are included and excluded from the tables in the join.
Joins in Tableau Prep
Join Clauses Pane
• Displays the columns from your join clause(s) side by side in 1 view
• Unmatched values are highlighted in red
Joins: Getting the Most Out of Your Data
Case 1: Cleaning and Filtering
Case 1: Cleaning and Filtering
In every join…
Rows are matched according to the join condition.
In Tableau Prep…
We pay equal attention to both matched and unmatched data.
In general…
If a table is a source of truth, it can be a powerful tool for cleaning your data.
Case 1: Cleaning and Filtering
Scenario:
• Terrence’s employees manually record restaurant orders.
• Terrence has a table of Employees, which he knows is accurate and error-free.
• He has a table of Orders that may contain errors.
DEMO!
Case 1: Cleaning and Filtering
Summary
• A table with reliable data can be used in a join to verify the quality of another table.
• In a join, mismatched values are accented in red.
• Clean values directly in the join pane.
Case 2: Outer Joins
Case 2: Outer Joins
Inner Joins
• Default join type
• Only rows that match our join clause(s) is kept in the join result
Inner Join
Case 2: Outer Joins
Outer Joins
• Unmatched values can be important.
• Outer joins allow us to include unmatched data from one or more tables.
Left Join Right Join Full Outer Join
Case 2: Outer Joins
Filling out dimensions/axes
• All possible values might not show up in the data.
• Some data is best represented on a complete axis / dimension.
Including unmatched values
• Outer joins allow us to include unmatched values in our result.
Case 2: Outer Joins
Back to Terrence’s Restaurant…
• How is the menu doing?
• What about unordered items?
Case 2: Outer Joins
Join Orders Table with Menu Table
• The Orders table alone only shows items ordered at least once.
• Use the rows from the Menu table to “fill in” the missing items not present in the Orders table.
Orders Menu
DEMO!
Case 2: Outer Joins
Summary
• Not every item on the menu has been ordered.
• Join Orders with Menu to fill in the complete set of orders when we visualize in Tableau.
• Change the join type from inner join to right join to include the unmatched values.
• Takeaway: use outer joins to fill out a dimension
Case 2: Outer Joins
Full Outer Joins
• Not commonly used.
• Includes unmatched data from both tables in join result.
Full Outer Join
Case 2: Outer Joins
Back at Terrence’s Restaurant…
• Some employees are not assigned a shift
• Some shifts do not have employees assigned
DEMO!
Case 2: Outer Joins
Full Outer Joins
• Every employee who has a shift assigned and every shift that has an employee assigned
• Every employee who does not have a shift assigned
• Every shift that does not have an employee assigned
Case 2: Outer Joins
Summary
• Full Outer Join Employees with Schedules
• Change join type to ‘unmatched only’ to see employees without shifts and shifts without employees
Case 3: Range Join Conditions
Case 3: Range Join Conditions
Slowly Changing Dimensions
• Data that is best represented with start and end points.
Examples:
• Job term lengths
• Assignment durations
• Subscription times
Case 3: Range Join Conditions
Manager Start Term End Term
Linda 1/1/2018 3/31/2018
Faisal 4/1/2018 7/31/2018
Ruben 8/1/2018 9/31/2018
Sales Date
10000 1/5/2018
24000 3/28/2018
6000 4/29/2018
6000 6/6/2018
30000 9/5/2018
Managers Sales
Case 3: Range Join Conditions
Manager Start Term End Term Sales Date
Linda 1/1/2018 3/31/2018 10000 1/5/2018
Linda 1/1/2018 3/31/2018 24000 3/28/2018
Faisal 4/1/2018 7/31/2018 6000 4/29/2018
Faisal 4/1/2018 7/31/2018 6000 6/6/2018
Ruben 8/1/2018 9/31/2018 30000 9/5/2018
Join Conditions:managers.start_term < sales.datemanagers.end_term > sales.date
Case 3: Range Join Conditions
Scenario:
My employees tell me they are usually either overwhelmed or not busy at all.
Questions I want to explore:
Am I allocating my resources in a way that accurately reflects the restaurant’s needs?
DEMO!
Case 4: Self Joins
Case 4: Self Joins
Terrence’s favorite case!
Self Joins are…
• seldom used.
• purely a reshaping operation – no external data is connected.
• can bring wonderful insight on how values of a column relate to other values from the same column.
Case 4: Self Joins
Scenario:
I’m still managing my restaurant (never give up!).
This time, I want to analyze my menu.
Questions I want to explore:
What menu items are commonly ordered together?
What do meals tend to look like compositionally?
The Recipe
What we need:
1. Values we want to explore (column of interest)
2. A way they are grouped (grouping column)
3. A unique ID for each row
Examples:
Medications or conditions, grouped into patients
Menu items, grouped into meals
Etc.
Join Conditions
Group_Column_1 = Group_Column_2
Column_of_Interest_1 != Column_of_Interest_2
Viz: Most Common Pairings
Rows → CNTD(Row_ID_1)
Columns → Column_of_Interest_1
Filter → Column_of_Interest_2
Viz 2: Group Compositions
Rows → CNTD(Row_ID_1)
Columns → Group_Column_1
Color → Column_of_Interest_1
Please complete the
session survey from the My
Evaluations menu
in your TC18 app
Questions?
Top Related