This tutorial was developed for the Visualizing Venice...

13
This tutorial was developed for the Visualizing Venice 2016 Summer Workshop. http://www.dukewired.org/visualizing-venice-the-ghetto-of-venice/ Instructors: Mark J.V. Olson, Victoria Szabo Teaching Assistants: Ludovica Galeazzo, Hannah L. Jacobs, Edward Triplett http://www.dukewired.org/ It is licensed under CC-BY-NC-SA 3.0 US. https://creativecommons.org/licenses/by-nc-sa/3.0/us/ Please use, reuse, mix, and cite your source(s)!

Transcript of This tutorial was developed for the Visualizing Venice...

Page 1: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

This tutorial was developed for the Visualizing Venice 2016 Summer Workshop.

http://www.dukewired.org/visualizing-venice-the-ghetto-of-venice/

Instructors: Mark J.V. Olson, Victoria Szabo

Teaching Assistants: Ludovica Galeazzo, Hannah L. Jacobs, Edward Triplett http://www.dukewired.org/

It is licensed under CC-BY-NC-SA 3.0 US.

https://creativecommons.org/licenses/by-nc-sa/3.0/us/

Please use, reuse, mix, and cite your source(s)!

Page 2: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

About this Tutorial

RAW is an open web application that creates custom vector-based visualizations built on the D3.js

JavaScript library using a graphical user interface in your browser. D3.js is a JavaScript library that *

converts data into a visualization of your design without the need for a proprietary framework. †

Part I: Preparing the Data

For this tutorial, we will return to the historical data on shops in the Ghetto.

1. Locate and open the data, “Shops_Raw.csv”, in Microsoft Excel. This spreadsheet is a

compilation of the data from 1661, 1712, and 1739. Note the three columns: Tipologia (type),

Anno (year), and Nome (name).

2. Close the file and locate it in Finder. Right-click on it, and select Open With… → Atom. The

data has been saved in Excel as a .csv (comma-separated value) file. In a text editor, we can

see the raw form of the data, in which each line corresponds to a record, and columns are

separated by commas. We will need this data format for creating visualizations with RAW.

Language paraphrased from http://raw.densitydesign.org/. *

Language paraphrased from http://d3js.org/.†

Visualizing Venice Summer Workshop 2016 �1

Page 3: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

Part II: Creating a RAW Visualization

1. Navigate to http://raw.densitydesign.org/.

2. Click “Use it now!” to open the four step visualization process.

3. In the window that loads, we can paste in our data. In the text editor, select all of your data.

Copy it (Edit → Copy) and paste it (Edit → Paste) into the RAW text field.

4. RAW will attempt to read your data and will give you a thumbs up if it has successfully read

(parsed) your data.

5. Scroll down in your browser to reveal the next step: choosing a visualization type.

There are 16 visualization types available as well as an option to add your own visualization

style created in d3.js.

Visualizing Venice Summer Workshop 2016 �2

Page 4: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

6. Each visualization type can show a different aspect of our data. We might choose to highlight

change over time, typological quantities, business quantities, etc. In Part IV below, you can find

information on each visualization type and how you might choose to use it.

7. Let’s first try visualizing quantities of business types. Select

the circle packing visualization in the “Choose a Chart”

section.

8. A third step will appear below. Scroll down the page.

9. In the “Map your dimensions” section, we must choose how RAW creates a visualization using

the three columns of data. The dimensions, attributes that affect how a chart will look (size,

color, label, groupings, etc.), change depending on the chart you choose.

10. The column headers are listed in green on the left. Dimensions are listed to the right. Note

that each dimension type requires certain types of data: numbers, strings, and/or dates. RAW

attempts to identify each column as one of these types—listed next to the column name. If a

dimension does not accept a data type, that column’s box will turn yellow when it is added to

the dimension.

11. Drag and drop a column header into a dimension box: start by adding “Nome” to Hierarchy

and Color. (It is often possible to apply a column to multiple dimensions. It may also be

possible in some cases to leave some dimensions empty.)

Visualizing Venice Summer Workshop 2016 �3

Page 5: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

12. Scroll down now to view and customize the visualization you’ve created.

� Your visualization should appear similar to this visualization: each colored bubble corresponds

to a business type (Nome), and the size of each bubble is determined by the number of

specific types listed in the dataset.

13. Does this visualization accomplish what we’ve set out to show? Might there be another way to

go about it?

14. Let’s now load into RAW the data saved in 1661_Shops_Raw.csv.

15. This data contains the list of Jewish business types present in and around the Ghetto in 1661.

Note, however, that the Anno column is gone, and a new “Quantity” column has been added.

Each business type appears only once in the data, and the “Quantity” column gives the

number of shops present.

16. Let’s see what happens when we use this data to create a new clustered force layout

visualization using the following as dimensions:

Visualizing Venice Summer Workshop 2016 �4

Page 6: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

17. This time, we should see a visualization that gives us only the quantities of business types for

one year, color coded by Tipologia/type. In this case, removing the data from other years and

adjusting the data so that we include a numbers data type made it possible for us to be more

explicit in the visualization we’ve created. We’ve also chosen a slightly different visualization

format that replaces the “Hierarchy” dimension with a much more democratic “clusters”—as

there is no need to show any hierarchy in this case.

18. We can now customize the visualization’s appearance in the left menu

in “Customize your Visualization”.

19. Here we can adjust the visualization’s height and width, the padding

between nodes and clusters, and the colors chosen.

20. When you’re satisfied with these customizations, move to the final

sharing section where you may choose to download the visualization

as a .png, .svg., or .json file or to embed the visualization on a web

page using XML code provided in the “Embed Code” text box.

21. Note that it is possible to create further customizations in

the .svg, .json, and XML formats if you are familiar with XML or JSON.

Part III: Visualizing Time in RAW

We’ve made one visualization that shows different types of shops during one year, but what if we

want to add the time dimension to a visualization? We can do this in RAW using the Bump Chart,

Small Multiples, or Streamgraph.

1. Load the dataset “Shows_Time_Raw.csv” in RAW.

2. Note again that the data structure has been adjusted slightly from the original file to include

quantities in the far right field. Note also that the date format has changed to reflect RAW’s

accepted date formats.

3. This time, choose bump chart and set your dimensions to

Group: Nome Date: Anno Size: Quantity

4. View your chart. What does it tell you about changes in shops across the time span? What is

not made clear through this visualization?

Visualizing Venice Summer Workshop 2016 �5

Page 7: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

5. Try the streamgraph visualization. What does it show that the bump chart does not? What

about the small multiples?

One next step for this visualization might be to consider comparing only a few of the business

types listed. What other kinds of visualizations can you envision from this dataset?

Part IV: RAW Visualization Types

Alluvial Diagram (Fineo-like)

Best for: showing relationships between individuals and/or categories

Dimenions: Steps, Size

Data types: numbers, strings, dates

Alluvial diagrams show correlations between categorical dimensions by visually linking elements that share the same categories. It can be used to show the evolution of cluster (movement of an element from one type of category to an-other) and to show groupings of elements that share common categories. Alluvial dia-grams can also known as Sankey diagrams or bipartite graphs. Compare to Parallel Coor-dinates.

Examples:

Raw’s design of the Alluvial Diagram is inspired by http://bost.ocks.org/mike/sankey.

Another example of the Alluvial Diagram in action can be seen in the People of Medieval Scotland 1093-1314 Relationships Explorer: http://db.poms.ac.uk/labs/connectionscloud.

Bump Chart

Best for: comparing quantitative changes over time be-tween multiple individuals/groups

Dimensions: Group, Date, Size

Data types: numbers, strings, dates

Raw’s Bump Chart shows viewers quantitative change over time as compared between different data groups. These “groups” themselves can be numbers, strings, or dates and could represent a single entity (such as one person) or a group of entities (as in the first example, people born in a particular area in the US).

Examples:

Raw’s design of the Bump Chart is inspired by the New York Times’ visualization: http://www.nytimes.com/interactive/2014/08/13/upshot/where-people-in-each-state-were-born.html?_r=0.

Visualizing Venice Summer Workshop 2016 �6

Page 8: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

Popular baby names: http://www.visualcinnamon.com/babynamesus.

Circle Packing Best for: showing hierarchical structures and quantita-tive relationships between elements based on size and position

Dimensions: hierarchy, size, color, label

Data types: numbers, strings, dates

Circle packing, or nested circles enable users to show simultaneously hierarchical and quantitative relationships. This visualization is particularly effective for showing propor-tions between elements at different levels of hierarchy. See also Clustered Force Layout.

Examples:

Raw’s design of the Circle Packing chart is inspired by Mike Bostock’s demonstration: http://bl.ocks.org/mbostock/4063530.

Jane Austen adaptations by year: https://janeaustendatavisualization.wordpress.com/2015/03/28/circle-packing-colour-pride-prejudice-adaptations-data/.

Circular Dendrogram

Best for: showing a large (wide) non-weighted hierar-chy in a more compact way, especially when the hierar-chy begins with more than one top level category

Dimensions: hierarchy

Data types: numbers, strings, dates

Dendrograms are tree-like diagrams used to represent the distribution of a non-weighted hierarchical clustering. This circular dendrogram places the highest hierarchical level, the single “root” above your data’s top categories, in the center. (If you have only one top cat-egory, this will appear as the root.) Each concentric ring moving outward is a progressive step down in the hierarchy. See also Cluster Dendrogram.

Examples:

Raw’s design of the Circular Dendrogram chart is inspired by Mike Bostock’s demonstra-tion: http://bl.ocks.org/mbostock/4063570

Visualizing My Craft Beer Consumption: http://vizthinker.com/visualizing-my-craft-beer-consumption-with-circular-dendrograms/

Visualizing Venice Summer Workshop 2016 �7

Page 9: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

Cluster Dendrogram

Best for: showing a non-weighted hierarchy

Dimensions: hierarchy

Data types: numbers, strings, dates

Dendrograms are tree-like diagrams used to represent the distribution of a non-weighted hierarchical clustering. The different depth levels repre-sented by each node are visualized on the horizontal axis with the highest level of the hi-erarchy appearing on the left and radiating down the hierarchy to the right. See also Circu-lar Dendrogram and Reingold-Tilford Tree.

Examples:

Raw’s design of the Cluster Dendrogram chart is inspired by Mike Bostock’s demonstra-tion: http://bl.ocks.org/mbostock/4063570

Unified Astronomy Thesaurus: http://www.altbibl.io/astrothesaurus/uat/dendrogram.html

Clustered Force Layout Best for: categorizing and comparing individual elements

Dimensions: clusters, size, label, color

Data types: numbers, strings, dates

Similar to Circle Packing, Clustered Force Layout enables users to show simultaneously categorical and quantitative relationships so that elements can be analyzed both within and across categories. See also Circle Packing.

Examples:

Raw’s design of the Clustered Force Layout chart is inspired by Mike Bostock’s demonstra-tion: http://bl.ocks.org/mbostock/7882658

World’s Biggest Data Breaches: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/

Convex Hull

Best for: showing a dataset’s overall quantitative and/or temporal value on two axes

Dimensions: X axis, Y axis

Data types: numbers, dates

A convex hull is the smallest convex shape created by a scatter plot, in which two values (quantitative or temporal) are compared for each element. Each element is represented by a point in the polygon.

Visualizing Venice Summer Workshop 2016 �8

Page 10: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

Examples: Raw’s design of the Convex Hull chart is inspired by Mike Bostock’s demonstra-tion: http://bl.ocks.org/mbostock/4341699

Delaunay Triangulation

Best for: showing a dataset’s overall quantitative and/or temporal value on two axes

Dimensions: X axis, Y axis

Data types: numbers, dates

This visualization is a combination of the Convex Hull and Voronoi tessellation. It shows elements quantitatively by comparing two of their quantitative values. Each element in the dataset is represented by a vertex. Vertices connect to form triangles, which together make up a planar mesh representing the entire dataset. As with the Convex Hull, this mesh is the smallest polygon possible for this dataset.

See also Voronoi tessellation and Convex Hull.

Examples: Raw’s design of the Delauney Triangulation chart is inspired by Mike Bostock’s demonstration: http://bl.ocks.org/mbostock/4341156

Hexagonal Binning

Best for: showing a large dataset’s most common val-ues (quantitative or temporal) in two variables (or cat-egories)

Dimensions: X axis, Y axis

Data types: numbers, dates

Like a scatter plot, except that each hexagon represents a point where two values (quanti-tative or temporal) appear together. The darker color the hexagon, the more often the combination of numbers and/or dates occurs in the dataset.

Examples: Raw’s design of the Hexagonal Binning chart is inspired by Mike Bostock’s demonstration: http://bl.ocks.org/mbostock/4248145

Parallel Coordinates

Best for: Comparing two or more quantitative attribut-es across a dataset while also using color to show cat-egories.

Dimensions: dimensions (columns), color

Data types: numbers, dates, string

Parallel Coordinates charts are a great way to analyze multivariate data, or data with mul-tiple different values. Each column (dimension) in Parallel Coordinates represents a quanti-

Visualizing Venice Summer Workshop 2016 �9

Page 11: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

tative value, while the color can be qualitative. An element is represented by a line that moves from one column to the next. The line moves up or down along the columns de-pending on its value in each dimension. Compare to Alluvial Diagrams.

Examples: Raw’s design of the Parallel Coordinates chart is inspired by Mike Bostock’s demonstration: http://bl.ocks.org/jasondavies/1341281

Women Writers Project: http://www.wwp.northeastern.edu/wwo/lab/textbase.html

Reingold-Tilford Tree

Best for: showing a non-weighted hierarchy

Dimensions: hierarchy

Data types: numbers, dates, string

As with the circular dendrogram, this visualization shows hierarchies, beginning with a top-level “root”--

the top of your hierarchy if there is only one value, or a step above your dataset’s top lev-els, if there is more than one value, in your chosen hierarchy. The root appears at the left of the chart, and data categories (attributes) descend in hierarchical order to the right. Com-pare to the Cluster Dendrogram.

Examples: Raw’s design of the Reingold-Tilford Tree chart is inspired by Mike Bostock’s demonstration: http://bl.ocks.org/mbostock/4339184.

Scatter Plot

Best for: showing relationships between two quantita-tive or temporal values for an element

Dimensions: X axis, Y axis, size, color, label

Data types: numbers, dates, string

A scatter plot is a type of mathematical diagram that compares values of two variables (categories) for a dataset on an X-Y graph (Cartesian co-ordinates). Each element in the dataset is a point on the graph. The point’s position is de-termined by the X and Y values of the element.

Small Multiples (Area)

Best for: showing quantitative change over time

Dimensions: group, date, size

Data types: numbers, dates, string

A small multiple is a series of small similar graphics or charts, in this case a type of line graph. See also Streamgraph.

Visualizing Venice Summer Workshop 2016 �10

Page 12: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

Examples: Raw’s design of the Small Multiples chart is inspired by Mike Bostock’s demon-stration: http://bl.ocks.org/mbostock/9490313.

The Rise and Decline of Ask MetaFilter: http://www.projects.flowingdata.com/tut/linked_small_multiples_demo/.

Streamgraph

Best for: showing quantitative change over time

Dimensions: group, date, size

Data types: numbers, dates, string

A streamgraph can show continuous change over time and is comparable to a stacked bar chart. See also Small Multiples.

Examples: Raw’s design of the Streamgraph chart is inspired by Mike Bostock’s demon-stration: http://bl.ocks.org/mbostock/4060954.

Will Turman’s D3 Interactive Streamgraph: http://bl.ocks.org/WillTurman/4631136.

Treemap

Best for: showing categorized hierarchies

Dimensions: hierarchy, size, color, label

Data types: numbers, dates, string

A treemap shows hierarchies and proportions between a dataset’s elements. Hierarchical levels are clustered together. Large rectangles represent categories and are subdivided into further rectangles, stepping down the hierarchy until individual elements are reached.

Examples: Raw’s design of the Treemap chart is inspired by Mike Bostock’s demonstration: http://bl.ocks.org/mbostock/4063582.

Voronoi Tessellation

Best for: showing a dataset’s overall quantitative and/or temporal value on two axes

Dimensions: hierarchy, size, color, label

Data types: numbers, dates, string

Voronoi Tessellation shows the minimum area of a polygon surrounding each point in a dataset. These polygons are defined by two variables (categories). The points themselves are set up in a scatter plot. The polygons are helpful for seeing distance between points. See also Delauney Triangulation.

Visualizing Venice Summer Workshop 2016 �11

Page 13: This tutorial was developed for the Visualizing Venice ...s3.amazonaws.com/vv2016/vv2016/wp-content/uploads/2016/08/31211721/RA… · 1. Locate and open the data, “Shops_Raw.csv”,

Data Visualizations in RAW

Examples: Raw’s design of the Voronoi Tessellation chart is inspired by Mike Bostock’s demonstration: http://bl.ocks.org/mbostock/4060366.

Resources

- RAW FAQs: https://github.com/densitydesign/raw/wiki/FAQs

- Add your own d3.js visualization: https://github.com/densitydesign/raw/wiki/Adding-New-Charts

- Intro to Data Visualization (UCLA Center for Digital Humanities): http://

dh101.humanities.ucla.edu/?page_id=40

- Data + Design: a simple introduction to preparing and visualizing data: https://infoactive.co/

data-design

Visualizing Venice Summer Workshop 2016 �12