Tableau Conference 2018 - The Joy in Joins · 2020. 1. 6. · Connect Data •Fundamental purpose...

Post on 16-Oct-2020

0 views 0 download

Transcript of Tableau Conference 2018 - The Joy in Joins · 2020. 1. 6. · Connect Data •Fundamental purpose...

Join us for Joins (The Joy in Joins!!)

Terrence MaasSoftware Engineer

tmaas@tableau.com

# T C 1 8

Joanna ChenSoftware Engineer

jochen@tableau.com

Agenda

Joins – Why the hype?

Intro to Tableau Prep

Practical Join Cases:

1. Cleaning and Filtering

2. Outer joins – Capture ALL the Values!

3. Range joins – Visualizing Occupancy

4. Self Joins – Correlations and Compositions

Q&A

What’s so great about joins?

Why do we use joins?

Connect Data

• Fundamental purpose of joins.

• Data derives most of its value by connecting to other data.

Reshape Data

• Different kinds of joins enable different kinds of analysis.

Select Data

• Join Conditions provide a bar for your data to meet.

• Tableau Prep uses this to help you clean your data.

Terrence’s Restaurant

Terrence’s Restaurant

Tables:

Employees

Schedules

Orders

Menu Items

Questions:

• Am I collecting good quality data?

• When is my restaurant busiest?

• What menu items are most popular? What about pairings?

Tableau Prep!

Tableau Prep

Tableau’s newest product offering

Released in April 2018

Mission Statement

Tableau Prep empowers more people to get to analysis faster by helping them quickly and confidently combine, shape, and clean their data.

Tableau Prep

Tableau Prep

Connection Pane

Flow Pane

Steps in the Flow

Profile Pane

Changes Pane

Data Grid

Joins in Tableau Prep

Configuring Your Join in Prep

Join Clause(s) are used to describe the relation of column(s) between two tables.

Join types are used to control how to exclude or include rows from two tables, depending on the join clauses defined above.

The Summary of Join Results shows you the distribution of values that are included and excluded from the tables in the join.

Joins in Tableau Prep

Join Clauses Pane

• Displays the columns from your join clause(s) side by side in 1 view

• Unmatched values are highlighted in red

Joins: Getting the Most Out of Your Data

Case 1: Cleaning and Filtering

Case 1: Cleaning and Filtering

In every join…

Rows are matched according to the join condition.

In Tableau Prep…

We pay equal attention to both matched and unmatched data.

In general…

If a table is a source of truth, it can be a powerful tool for cleaning your data.

Case 1: Cleaning and Filtering

Scenario:

• Terrence’s employees manually record restaurant orders.

• Terrence has a table of Employees, which he knows is accurate and error-free.

• He has a table of Orders that may contain errors.

DEMO!

Case 1: Cleaning and Filtering

Summary

• A table with reliable data can be used in a join to verify the quality of another table.

• In a join, mismatched values are accented in red.

• Clean values directly in the join pane.

Case 2: Outer Joins

Case 2: Outer Joins

Inner Joins

• Default join type

• Only rows that match our join clause(s) is kept in the join result

Inner Join

Case 2: Outer Joins

Outer Joins

• Unmatched values can be important.

• Outer joins allow us to include unmatched data from one or more tables.

Left Join Right Join Full Outer Join

Case 2: Outer Joins

Filling out dimensions/axes

• All possible values might not show up in the data.

• Some data is best represented on a complete axis / dimension.

Including unmatched values

• Outer joins allow us to include unmatched values in our result.

Case 2: Outer Joins

Back to Terrence’s Restaurant…

• How is the menu doing?

• What about unordered items?

Case 2: Outer Joins

Join Orders Table with Menu Table

• The Orders table alone only shows items ordered at least once.

• Use the rows from the Menu table to “fill in” the missing items not present in the Orders table.

Orders Menu

DEMO!

Case 2: Outer Joins

Summary

• Not every item on the menu has been ordered.

• Join Orders with Menu to fill in the complete set of orders when we visualize in Tableau.

• Change the join type from inner join to right join to include the unmatched values.

• Takeaway: use outer joins to fill out a dimension

Case 2: Outer Joins

Full Outer Joins

• Not commonly used.

• Includes unmatched data from both tables in join result.

Full Outer Join

Case 2: Outer Joins

Back at Terrence’s Restaurant…

• Some employees are not assigned a shift

• Some shifts do not have employees assigned

DEMO!

Case 2: Outer Joins

Full Outer Joins

• Every employee who has a shift assigned and every shift that has an employee assigned

• Every employee who does not have a shift assigned

• Every shift that does not have an employee assigned

Case 2: Outer Joins

Summary

• Full Outer Join Employees with Schedules

• Change join type to ‘unmatched only’ to see employees without shifts and shifts without employees

Case 3: Range Join Conditions

Case 3: Range Join Conditions

Slowly Changing Dimensions

• Data that is best represented with start and end points.

Examples:

• Job term lengths

• Assignment durations

• Subscription times

Case 3: Range Join Conditions

Manager Start Term End Term

Linda 1/1/2018 3/31/2018

Faisal 4/1/2018 7/31/2018

Ruben 8/1/2018 9/31/2018

Sales Date

10000 1/5/2018

24000 3/28/2018

6000 4/29/2018

6000 6/6/2018

30000 9/5/2018

Managers Sales

Case 3: Range Join Conditions

Manager Start Term End Term Sales Date

Linda 1/1/2018 3/31/2018 10000 1/5/2018

Linda 1/1/2018 3/31/2018 24000 3/28/2018

Faisal 4/1/2018 7/31/2018 6000 4/29/2018

Faisal 4/1/2018 7/31/2018 6000 6/6/2018

Ruben 8/1/2018 9/31/2018 30000 9/5/2018

Join Conditions:managers.start_term < sales.datemanagers.end_term > sales.date

Case 3: Range Join Conditions

Scenario:

My employees tell me they are usually either overwhelmed or not busy at all.

Questions I want to explore:

Am I allocating my resources in a way that accurately reflects the restaurant’s needs?

DEMO!

Case 4: Self Joins

Case 4: Self Joins

Terrence’s favorite case!

Self Joins are…

• seldom used.

• purely a reshaping operation – no external data is connected.

• can bring wonderful insight on how values of a column relate to other values from the same column.

Case 4: Self Joins

Scenario:

I’m still managing my restaurant (never give up!).

This time, I want to analyze my menu.

Questions I want to explore:

What menu items are commonly ordered together?

What do meals tend to look like compositionally?

The Recipe

What we need:

1. Values we want to explore (column of interest)

2. A way they are grouped (grouping column)

3. A unique ID for each row

Examples:

Medications or conditions, grouped into patients

Menu items, grouped into meals

Etc.

Join Conditions

Group_Column_1 = Group_Column_2

Column_of_Interest_1 != Column_of_Interest_2

Viz: Most Common Pairings

Rows → CNTD(Row_ID_1)

Columns → Column_of_Interest_1

Filter → Column_of_Interest_2

Viz 2: Group Compositions

Rows → CNTD(Row_ID_1)

Columns → Group_Column_1

Color → Column_of_Interest_1

Please complete the

session survey from the My

Evaluations menu

in your TC18 app

Questions?