Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy...

58
Data Visualization (DSC 530/CIS 568) Definition & Web Programming Dr. David Koop D. Koop, DSC 530, Spring 2019

Transcript of Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy...

Page 1: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Data Visualization (DSC 530/CIS 568)

Definition & Web Programming

Dr. David Koop

D. Koop, DSC 530, Spring 2019

Page 2: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Exploration <-> Communication Spectrum

�2

[K. Quealy, 2013]D. Koop, DSC 530, Spring 2019

Exploration CommunicationConfirmation

Consecutive Starts by a Quarterback for a Single Team

Page 3: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Exploration <-> Communication Spectrum

�2

[K. Quealy, 2013]D. Koop, DSC 530, Spring 2019

Exploration CommunicationConfirmationQuestions Answers/Persuasion

Consecutive Starts by a Quarterback for a Single Team

Page 4: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

The Power of Interactive Visualization

�3

FAQAlbum or artist: Search...Music Timelinepo

pula

rity

1982

ThrillerMichael Jackson

2003

Chocolate FactoryR. Kelly

2006

Back To BlackAmy Winehouse

2007

Good Girl Gone BadRihanna

2007

Marvin Gaye '50'Marvin Gaye

2014

XChris Brown

2014

In The Lonely HourSam Smith

1988

Past MastersThe Beatles

2006

The Essential Elvis Presley 3.0Elvis Presley

2009

My Kind Of ChristmasDean Martin

2010

Teenage DreamKaty Perry

2012

Four of a Kind - 200 Classic SongsThe Everly Brothers

2015

DeliriumEllie Goulding

2015

VMaroon 5

19501950

1950

19601960

1960

19701970

1970

19801980

1980

19901990

1990

20002000

2000

20102010

2010Comedy/SpokenWord/Other

World

Vocal/Easy Listening

Folk LatinReggae

Dance/ElectronicR&B/Soul

Blues Hip-Hop/Rap

Alternative/Indie

Country

Metal

Jazz Rock

Pop

[Music Timeline, Google Research]D. Koop, DSC 530, Spring 2019

Page 5: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition of Visualization

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”

— T. Munzner

�4D. Koop, DSC 530, Spring 2019

Page 6: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”

�5D. Koop, DSC 530, Spring 2019

Page 7: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”

�5D. Koop, DSC 530, Spring 2019

NYC Subway Fare Data

Page 8: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”

�5D. Koop, DSC 530, Spring 2019

Find Interesting NYC Subway Ridership Patterns

NYC Subway Fare Data

Page 9: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.”

�6D. Koop, DSC 530, Spring 2019

Page 10: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why People?• Certain tasks can be totally automated

- Statistical computations - Machine learning algorithms - We don’t need visualization for these tasks (although perhaps for

debugging them…) • Analysis problems are often ill-specified

- What is the correct question? - Exploit human visual system, pattern detection capabilities - Goal may be an automated solution or a visual analysis system

• Presentation - It is often easier to show someone something than to tell them a

bunch of facts about the data (and let them explore it)

�7D. Koop, DSC 530, Spring 2019

Page 11: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why Computers?

�8

[Cerebral, Barsky et al., 2007]D. Koop, DSC 530, Spring 2019

Page 12: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why Computers?

�9

[Cerebral, Barsky et al., 2007]D. Koop, DSC 530, Spring 2019

Page 13: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Resource Limitations• Memory and space constraints • How many pixels do I have? • Information Density

�10

[McGuffin & Robert, 2010]D. Koop, DSC 530, Spring 2019

Fig. 2. A tree of regions and major islands of the Philippines, drawnusing squarified treemaps (top) and using icicle diagrams (bottom). Thetwo diagrams on the left weight leaf nodes by geographic area, whereasthe two diagrams on the right give equal area to leaf nodes. Labels arerotated when necessary to maximize their size.

arbitrarily deep. Our analysis allows us to rank tree representations bytheir efficiency, which is useful for helping designers choose the mostefficient representation allowable within other given constraints.

Our work also quantifies an interesting difference between repre-sentations in how they distribute area across nodes. For example, theicicle diagrams in Figures 1C, 2C, and 2D allocate equal area to eachlevel of the tree: the root node has the same area as all the leaf nodestogether. Treemaps, in contrast, typically allocate more area to deepernodes. There is a tradeoff here, since we would like users to be ableto see as many deep nodes as possible (which tend to also be the mostnumerous nodes), while at the same time providing some informationabout shallow nodes (for example, to give an overview of subtrees,and/or to guide the user in zooming operations). This article developsa new metric, the mean area exponent, that describes the distributionof area across levels of a tree representation, to quantify this tradeoff.

Finally, we also present a set of design guidelines for using treerepresentations, as well as a few novel tree representations, includinga variation on squarified treemaps that allows for larger labels withinthe nodes.

2 RELATED WORK

Different tree representations, including classical node-link, icicle,nested enclosure, and indented outline, were identified decades agoin [4, 16], and an interactive version of the indented outline represen-tation (now popular in file browsers such as Microsoft Explorer) waspresented in [11]. Subsequent years have seen variations on these rep-resentations proposed. Treemaps are a relatively recent innovation,and are a kind of nested enclosure representation. Treemaps are of-ten described as space-filling, a highly desirable property for space-efficiency.

The term “space-filling” can sometimes be problematic, however.For example, a view sometimes expressed [21] holds that tree repre-sentations can be divided into two classes: (1) node-link diagrams, thatillustrate parent-child relationships with line segments or curves, and

(2) space-filling representations, which include treemaps and concen-tric circles such as Sunburst (Sunburst was described as space-filling in[27], and [21] similarly describe [2] as space-filling.) However, thesetwo classes seem to not be disjoint, because some node-link diagramsalso “fill space” [19, 20]. The 2nd class also ignores an interesting dif-ference between treemaps and concentric circles, namely that parentnodes in treemaps enclose their children, whereas parents in concen-tric circle diagrams are adjacent to their children. Finally, the term“space-filling” suggests increased space-efficiency, however it is easyto design a treemap layout algorithm that occupies all available spacewithout making good use of it, for example, by using excessively thickmargins, or by concentrating child nodes in only one corner of theirparent, leaving the rest of the parent empty and unused. Would sucha treemap cease to be considered space-filling, even though its rootnode covers all the available space? Without a precise definition of“space-filling”, we recommend being cautious about using this termto refer to a category of tree representations, since the name seemsto imply that members of the category are more space-efficient thannon-members. As an alternative, categories of representations couldinstead be based on how the nodes are drawn (e.g. representationswhere the nodes are mapped to points, and those where the nodes aremapped to areas) or on how parent-child relationships are shown (e.g.through line segments, enclosure, adjacency, or relative positioning).The space-efficiency of a given representation can be treated as a sep-arate matter, and evaluated by several metrics, as demonstrated in thisarticle.

Within the graph drawing community, a common approach for eval-uating space-efficiency is to compare the total area required by differ-ent drawings (i.e., representations) of the same graph or tree. Sinceany drawing can be scaled arbitrarily in x and y, to ensure a meaning-ful comparison, the “resolution” of the representations is fixed, oftenby requiring that nodes be positioned on a grid (i.e. with integer co-ordinates) [9]. There are problems with this general approach, how-ever, especially when comparing representations of trees rather thangraphs. For example, allowing only grid positions may be mislead-ing, because non-integer coordinates can significantly reduce total areawithout compromising the clarity of the representation or the spaceavailable for labels (Figure 3). As a potential remedy, instead of posi-tioning nodes on a grid, we might instead impose a minimum distancebetween nodes, or a minimum size for non-overlapping labels centeredover the nodes. Unfortunately, matters are complicated by the fact thatsome tree representations (such as Figures 1C, 1E, 1F, 1G) involvenodes that have an area and shape, and there may be nodes and labelsof different sizes within a single representation (e.g. deeper nodes maybe smaller and have smaller labels). This makes it less clear how toimpose a fixed resolution in a way that is fair across tree representa-tions. Note that this issue does not arise in traditional graph drawing,where nodes are typically mapped to points.

Fig. 3. A and B are adapted from a comparison in Figure 5 of [1], andshow two different graphical representations of the same tree wherenodes are constrained to positions with integer coordinates. B is clearlymore compact than A. In C, however, we have redrawn the represen-tation from A with the integer coordinate constraint relaxed, and the re-sulting graphical representation has a convex hull whose area is onlyabout 5% greater than that in B. Notice also that the minimum horizon-tal spacing between nodes in B and C is the same, allowing nodes to beoverlaid with horizontally oriented labels of the same size in both cases.

In our work, rather than comparing total area with a fixed resolution,we fix the total area available, and fix its aspect ratio. Representations

Page 14: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Administrivia• Course Web Site • Syllabus

- Plagiarism - Accommodations

• Textbook: - Required: Munzner (VAD) - Rec'd: Murray, 2nd ed. (IDV)

• Assignments • Tests: March 6 and April 10 • Registration:

- Add/Drop is Wednesday

�11D. Koop, DSC 530, Spring 2019

Page 15: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Do not cheat!

�12D. Koop, DSC 530, Spring 2019

Page 16: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Do not plagiarize• It is cheating. It violates the Academic Honesty Policy at UMassD. • Do your own work • Do not copy anyone else's work, text, sentences, …

- Anyone = another student, an internet source, book, blog, … • Never quote text unless there is a specific need.

- Usually, only famous quotes or very specific definitions - "I think there is a world market for maybe five computers."

—Thomas Watson (1874-1956), Chairman of IBM, 1943) • Cite sources that back up your claims or reflect the origin of an idea

- Vertex cover is an NP-Complete problem [1].…[1] Garey, M. R., and Johnson, D. S., "Computers and intractability: a guide to NP-completeness." 1979.

�13D. Koop, DSC 530, Spring 2019

Page 17: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Do not cheat • Cheating on assignments, projects, and exams is not allowed • You will receive a zero on the assignment/project/exam • It will be reported to the department and university • If it repeats, you will fail the course • You can be kicked out of the university

�14D. Koop, DSC 530, Spring 2019

Page 18: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Do ask questions!

�15D. Koop, DSC 530, Spring 2019

Page 19: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Do ask questions• If you are stuck on a specific issue with an assignment:

- Do email me with specific questions - Do consult books, online documentation, tutorials - Do discuss that specific issue with a classmate

• If you are asked about a question: - Do not share your code - If the questioner is trying to cheat, walk away - If you see an obvious mistake, kindly point it out - Suggest a specific function or library that may be useful

�16D. Koop, DSC 530, Spring 2019

Page 20: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively”

�17D. Koop, DSC 530, Spring 2019

Page 21: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why do we visualize data?

�18

Why Graphics?Figures are richer; provide more information with less clutter and in less space.Figures provide the gestalt effect: they give an overview; make structure more visible.Figures are more accessible, easier to understand, faster to grasp, more comprehensible, more memorable, more fun, and less formal. list adapted from: [Stasko et al. 1998] [via A. Lex]

D. Koop, DSC 530, Spring 2019

Page 22: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why Visual?

�19

[F. J. Anscombe]D. Koop, DSC 530, Spring 2019

I II III IV

x y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71

9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84

11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04

6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25

4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50

12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56

7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91

5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Page 23: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why Visual?

�19

[F. J. Anscombe]D. Koop, DSC 530, Spring 2019

I II III IV

x y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71

9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84

11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04

6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25

4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50

12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56

7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91

5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Mean of x 9

Variance of x 11

Mean of y 7.50

Variance of y 4.122

Correlation 0.816

Page 24: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why Visual?

�20

●●

●●

●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x1

y 1

●●

●●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x2

y 2●

●●

●●

●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x3

y 3

●●

●●

4 6 8 10 12 14 16 18

4

6

8

10

12

x4

y 4

[F. J. Anscombe]D. Koop, DSC 530, Spring 2019

Page 25: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Visual Pop-out

�21

[C. G. Healey]D. Koop, DSC 530, Spring 2019

Page 26: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Visual Pop-out

�22

[C. G. Healey]D. Koop, DSC 530, Spring 2019

Page 27: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Visual Pop-out

�23

[C. G. Healey]D. Koop, DSC 530, Spring 2019

Page 28: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Visual Perception Limitations

�24

[C. G. Healey]D. Koop, DSC 530, Spring 2019

Page 29: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Visual Perception Limitations

�25

[C. G. Healey]D. Koop, DSC 530, Spring 2019

Page 30: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Human Perception

�26

[Inside NOVA: Change Blindness]D. Koop, DSC 530, Spring 2019

Page 31: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Human Perception

�26

[Inside NOVA: Change Blindness]D. Koop, DSC 530, Spring 2019

Page 32: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Not Uncommon

�27

[Inside NOVA: Change Blindness]D. Koop, DSC 530, Spring 2019

Page 33: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Not Uncommon

�27

[Inside NOVA: Change Blindness]D. Koop, DSC 530, Spring 2019

Page 34: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Other Human Limitations• Visual working memory is small • Change blindness: A failure to notice a change in our view • Inattentional blindness: A failure to notice something else going

on in our view while focusing on a particular task • "The goal of vision is not to build a complete photograph or model

of the world in your mind. The goal of vision is to make sense of the meaning of the world around you." - D. Simons

�28D. Koop, DSC 530, Spring 2019

Page 35: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively”

�29D. Koop, DSC 530, Spring 2019

Page 36: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Design Iteration

�30

[K. Quealy, 2013]D. Koop, DSC 530, Spring 2019

Page 37: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Design Iteration

�31

[K. Quealy, 2013]D. Koop, DSC 530, Spring 2019

Page 39: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Another Design Example

�33

[M. Stefaner, 2013]D. Koop, DSC 530, Spring 2019

Page 40: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Design Studies• Design Study Methodology, Sedlmair et al., 2012 • Design study: "[A] way to explore the choices made when applying

visualization techniques to a particular application area" • Definition:

- a project, not a paper - a specific real-world problem, with real users and real data - design a visualization system: gather requirements, examine

multiple ideas - validate the design: problem characterization through final tool - reflect about lessons learned: improve design guidelines

�34

[Design Study Methodology, Sedlmair et al., 2012]D. Koop, DSC 530, Spring 2019

Page 41: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Design Study• Steps:

- Analyzing the problem - Abstracting data and tasks - Designing and implementing a visualization solution - Evaluating the solution with real users - Writing up the findings

�35D. Koop, DSC 530, Spring 2019

Page 42: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

A design study is a project in which visualization researchers analyze a specific real-world problem faced by domain experts, design a visualization system that supports solving this problem, validate the design, and reflect about lessons learned in order to refine visualization

design guidelines.

�36D. Koop, DSC 530, Spring 2019

Page 43: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

When are Design Studies Appropriate?

�37

straction, and a decent reflection on guidelines. On the other hand, avery thorough design and evaluation might counterbalance a moder-ate problem characterization or reflection. Our definitions imply thata design study paper does not require a novel algorithm or techniquecontribution. Instead, a proposed visualization design is often a well-justified combination of existing techniques. While a design studypaper is the most common outcome of a design study, other types ofresearch papers are also possible such as technique or algorithm, eval-uation, system, or even a pure problem characterization paper [50].

3.2 Task Clarity and Information Location AxesWe introduce two axes, task clarity and information location, as shownin Figure 1. The two axes can be used as a way to think and reasonabout problem characterization and abstraction contributions which,although common in design studies, are often difficult to capture andcommunicate.

The task clarity axis depicts how precisely a task is defined, withfuzzy on the one side and crisp on the other. An example of a crisptask is “buy a train ticket”. This task has a clearly defined goal with aknown set of steps. For such crisp tasks it is relatively straightforwardto design and evaluate solutions. Although similarly crisp low-levelvisualization tasks exist, such as correlate, cluster or find outliers [2],reducing a real-world problem to these tasks is challenging and timeconsuming. Most often, visualization researchers are confronted withcomplex and fuzzy domain tasks. Data analysts might, for instance,be interested in understanding the evolutionary relationship betweengenomes [45], comparing the jaw movement between pigs [34], or therelationship between voting behavior and ballot design [94]. Thesedomain tasks are inherently ill-defined and exploratory in nature. Thechallenge of evaluating solutions against such fuzzy tasks is well-understood in the information visualization community [59].

Task clarity could be considered the combination of many other fac-tors; we have identified two in particular. The scope of the task is one:the goal in a design study is to decompose high-level domain tasks ofbroad scope into a set of more narrow and low-level abstract tasks.The stability of the task is another: the task might change over thecourse of the design study collaboration. It is common, and in facta sign of success, for the tasks of the experts to change after the re-searcher introduces visualization tools, or after new abstractions causethem to re-conceptualize their work. Changes from external factors,however, such as strategic priority changes in a company setting orresearch focus changes in an academic setting, can be dangerous.

The second axis is the information location, characterizing howmuch information is only available in the head of the expert versuswhat has been made explicit in the computer. In other words, whenconsidering all the information required to carry out a specific task,this axis characterizes how much of the information and context sur-rounding the domain problem remains as implicit knowledge in theexpert’s head, versus how much data or metadata is available in a dig-ital form that can be incorporated into the visualization.

We define moving forward along either of these axes as a designstudy contribution. Note that movement along one axis often causesmovement along the other: increased task clarity can facilitate a bet-ter understanding of derived data needs, while increased informationarticulation can facilitate a better understanding of analysis needs [61].

3.3 Design Study Methodology SuitabilityThe two axes characterize the range of situations in which design studymethodology is a suitable choice. This rough characterization is notintended to define precise boundaries, but rather for guiding the under-standing of when, and when not, to use design studies for approachingcertain domain problems.

Figure 1 shows how design studies fall along a two-dimensionalspace spanned by the task clarity and the information location axes.The red and the blue areas at the periphery represent situations forwhich design studies may be the wrong methodological choice. Thered vertical area on the left indicates situations where no or very lit-tle data is available. This area is a dangerous territory because aneffective visualization design is not likely to be possible; we provide

INFORMATION LOCATION computerhead

TASK

CLA

RITY

fuzzy

crisp

NO

T EN

OU

GH

DAT

A

DESIGN STUDY METHODOLOGY SUITABLE

ALGORITHM AUTOMATION

POSSIBLE

Fig. 1. The task clarity and information location axes as a way to analyzethe suitability of design study methodology. Red and blue areas markregions where design studies may be the wrong methodological choice.

ways to identify this region when winnowing potential collaborationsin Section 4.1.2.

The blue triangular area on the top right is also dangerous terri-tory, but for the opposite reason. Visualization might be the wrongapproach here because the task is crisply defined and enough infor-mation is computerized for the design of an automatic solution. Con-versely, we can use this area to define when an automatic solution isnot possible; automatic algorithmic solutions such as machine learningtechniques make strong assumptions about crisp task clarity and avail-ability of all necessary information. Because many real-world dataanalysis problems have not yet progressed to the crisp/computer endsof the axes, we argue that design studies can be a useful step towardsa final goal of a fully automatic solution.

The remaining white area indicates situations where design studiesare a good approach. This area is large, hinting that different designstudies will have different characteristics. For example, the regionstowards the top left at the beginning of both axes require significantproblem characterization and data abstraction before a visualizationcan be designed—a paper about such a project is likely to have a sig-nificant contribution of this type. Design studies that are farther alongboth axes will have a stronger focus on visual encoding and designaspects, with a more modest emphasis on the other contribution types.These studies may also make use of combined automatic and visualsolutions, a common approach in visual analytics [84].

The axes can also associate visualization with, and differentiate itfrom other fields. While research in some subfields of HCI, such ashuman factors, deal with crisply defined tasks, several other subfields,such as computer supported cooperative work and ubiquitous comput-ing, face similar challenges in terms of ill-defined and fuzzy tasks.They differ from visualization, however, because they do not requiresignificant data analysis on the part of the target users. Conversely,fields such as machine learning and statistics focus on data analysis,but assume crisply defined tasks.

4 NINE-STAGE FRAMEWORK

Figure 2 shows an overview of our nine-stage framework with thestages organized into three categories: a precondition phase that de-scribes what must be done before starting a design study; a core phasepresenting the main steps of conducting a design study; and an analy-sis phase depicting the analytical reasoning at the end. For each stagewe provide practical advice based on our own experience, and out-line pitfalls that point to common mistakes. Table 1 at the end of thissection summarizes all 32 pitfalls (PF).

The general layout of the framework is linear to suggest that onestage follows another. Certain actions rely on artifacts from earlierstages—deploying a system is, for instance, not possible without some

[Design Study Methodology, Sedlmair et al., 2012]D. Koop, DSC 530, Spring 2019

Page 44: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

A Nine-Stage Framework

�38

PRECONDITIONpersonal validation

COREinward-facing validation

ANALYSISoutward-facing validation

learn implementwinnow cast discover design deploy reflect write

Fig. 2. Nine-stage design study methodology framework classified into three top-level categories. While outlined as a linear process, the overlappingstages and gray arrows imply the iterative dynamics of this process.

kind of implementation—and it is all too common to jump forwardover stages without even considering or starting them. This forwardjumping is the first pitfall that we identify (PF-1). A typical exampleof this pitfall is to start implementing a system before talking to thedomain experts, usually resulting in a tool that does not meet theirspecific needs. We have reviewed many papers that have fatal flawsdue to this pitfall.

The linearity of the diagram, however, does not mean that previousstages must be fully completed before advancing to the next. Manyof the stages often overlap and the process is highly iterative. In fact,jumping backwards to previous stages is the common case in orderto gradually refine preliminary ideas and understanding. For exam-ple, we inevitably always find ourselves jumping backwards to refinethe abstractions while writing a design study paper. The overlappingstages and gray arrows in Figure 2 imply these dynamics.

Validation crosscuts the framework; that is, validation is importantfor every stage, but the appropriate validation is different for each. Wecategorize validation following the three framework phases. In the pre-condition stage, validation is personal: it hinges on the preparation ofthe researcher for the project, including due diligence before commit-ting to a collaboration. In the core phase, validation is inward-facing:it emphasizes evaluating findings and artifacts with domain experts. Inthe analysis phases, validation is outward-facing: it focuses on justi-fying the results of a design study to the outside world, including thereaders and reviewers of a paper. Munzner’s nested model elaboratesfurther on how to choose appropriate methods at each stage [50].

4.1 Precondition PhaseThe precondition stages of learn, winnow, and cast focus on prepar-ing the visualization researcher for the work, and finding and filteringsynergistic collaborations with domain experts.

4.1.1 Learn: Visualization LiteratureA crucial precondition for conducting an effective design study is asolid knowledge of the visualization literature, including visual en-coding and interaction techniques, design guidelines, and evaluationmethods. This visualization knowledge will inform all later stages: inthe winnow stage it guides the selection of collaborators with interest-ing problems relevant to visualization; in the discover stage it focusesthe problem analysis and informs the data and task abstraction; in thedesign stage it helps to broaden the consideration space of possiblesolutions, and to select good solutions over bad ones; in the imple-ment stage knowledge about visualization toolkits and algorithms al-lows fast development of stable tool releases; in the deploy stage itassists in knowing how to properly evaluate the tool in the field; in thereflect stage, knowledge of the current state-of-the-art is crucial forcomparing and contrasting findings; and in the write stage, effectiveframing of contributions relies on knowledge of previous work.

Of course, a researcher’s knowledge will gradually grow over timeand encyclopedic knowledge of the field is not a requirement before

conducting a first design study. Nevertheless, starting a design studywithout enough prior knowledge of the visualization literature is a pit-fall (PF-2). This pitfall is particularly common when researchers whoare expert in other fields make their first foray into visualization [37];we have seen many examples of this as reviewers.

4.1.2 Winnow : Select Promising CollaborationsThe goal of this stage is to identify the most promising collaborations.We name this strategy winnowing, suggesting a lengthy process of sep-arating the good from the bad and implying that careful selection isnecessary: not all potential collaborations are a good match. Prema-ture commitment to a collaboration is a very common pitfall that canresult in much unprofitable time and effort (PF-3).

We suggest talking to a broad set of people in initial meetings, andthen gradually narrowing down this set to a small number of actual col-laborations based on the considerations that we discuss in detail below.Because this process takes considerable calendar time, it should beginwell before the intended start date of the implement stage. Initial meet-ings last only a few hours, and thus can easily occur in parallel withother projects. Only some of these initial meetings will lead to furtherdiscussions, and only a fraction of these will continue with a closercollaboration in the form of developing requirements in the discoverstage. Finally, these closer collaborations should only continue on intothe design stage if there is a clear match between the interests of thedomain experts and the visualization researcher. We recommend com-mitting to a collaboration only after this due diligence is conducted; inparticular, decisions to seek grant funding for a collaborative projectafter only a single meeting with a domain expert are often premature.We also suggest maintaining a steady stream of initial meetings at alltimes. In short, our strategy is: talk with many but stay with few, startearly, and always keep looking.

The questions to ask during the winnow stage are framed as rea-sons to decide against, rather than for, a potential collaboration. Wechoose this framing because continued investigation has a high timecost for both parties, so the decision to pull out is best made as early aspossible. Two of our failure cases underline the cost of late decision-making: the PowerSetViewer [54] design study lasted two years withfour researchers, and WikeVis [72] half a year with two researchers.Both projects fell victim to several pitfalls in the winnow and caststages, as we describe below; if we had known what questions to con-sider at these early stages we could have avoided much wasted effort.

The questions are categorized into practical, intellectual, and inter-personal considerations. We use the pronouns I for the visualizationresearcher, and they for the domain experts.

PRACTICAL CONSIDERATIONS: These questions can be easilychecked in initial meetings.Data: Does real data exist, is it enough, and can I have it?Some potential collaborators will try to initiate a project before realdata is available. They may promise to have the data “soon”, or “next

[Design Study Methodology, Sedlmair et al., 2012]D. Koop, DSC 530, Spring 2019

Page 45: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Collaborator Winnowing

�39

COLLABORATOR WINNOWING

initial conversation

furthermeetings

prototyping

fullcollaboration

38

collaborator

[Design Study Methodology, Sedlmair et al., 2012]D. Koop, DSC 530, Spring 2019

Talk with many, stay with few!

Page 46: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Design Space: Think Broad

�40

o

-

--

-

o

oo

+

+

+-

-

o

-

-

o

+o

-

o-

-

o-

-

-o

o

47

know

METAPHORDesign Space

consider

propose

select[Design Study Methodology, Sedlmair et al., 2012]

D. Koop, DSC 530, Spring 2019

Page 47: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Analysis• Want to contribute understanding of the way visualization can be

used to address specific problem • How does this relate to other situations? • Does this align with existing guidelines? Does it challenge them? • Is there anything that can still be improved?

�41D. Koop, DSC 530, Spring 2019

Page 48: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Definition

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively”

�42D. Koop, DSC 530, Spring 2019

Page 49: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Why Effectiveness?• “It’s not just about pretty pictures” • Any depiction of data requires the designer to make choices about

how that data is visually represented - Analogy to photography - Lots of possibilities (see quarterback study)

• Effectiveness measures how well the visualization helps a person with their tasks - How? insight, engagement, efficiency? - Benchmarks and user studies

�43D. Koop, DSC 530, Spring 2019

Page 50: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Design: Focus on only the y-axis

�44

[S. Hayward, 2015]D. Koop, DSC 530, Spring 2019

Page 51: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Simple World Assumption

�45

Simple World Assumption

Isolation Assumption

Geometric Assumption

Visualizations violate three CIELAB assumptions

[D. Szafir, 2017]D. Koop, DSC 530, Spring 2019

Page 52: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Problems with Simple World Assumption

�46

Viewing Distance

Environmental Surround

Viewing DistanceAmbientIllumination

DirectIllumination

Gamma,Whitepoint, Resolution,

Peak Color Outputs

Viewing Population

Simple World Assumption

Isolation Assumption

Geometric Assumption

Visualizations violate three CIELAB assumptions

Crowdsourced SamplingSzafir, Stone, & Gleicher, 2014

Reinecke, Flatla, & Brooks, 2016

[D. Szafir, 2017]D. Koop, DSC 530, Spring 2019

Page 53: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Isolation Assumption

�47

Simple World Assumption

Isolation Assumption

Geometric Assumption

Visualizations violate three CIELAB assumptions

[D. Szafir, 2017]D. Koop, DSC 530, Spring 2019

Page 54: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

What colors are in this graphic?

�48

[A. Kitaoka]D. Koop, DSC 530, Spring 2019

Page 55: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

What colors are in this graphic?

�48

[A. Kitaoka]D. Koop, DSC 530, Spring 2019

Red, yellow, blue

Purple, orange do not exist!

Page 56: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Analyzing Visualization

�49

Why?

How?

What?

Why?

How?

What?

Why?

How?

What?

[Munzner (ill. Maguire), 2014]D. Koop, DSC 530, Spring 2019

Page 57: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

What languages do we use on the Web?

�50D. Koop, DSC 530, Spring 2019

Page 58: Data Visualization (DSC 530/CIS 568)dkoop/dsc530-2019sp/lectures/lecture02.pdfBack To Black Amy Winehouse 2007 Good Girl Gone Bad Rihanna 2007 Marvin Gaye '50' Marvin Gaye 2014 X Chris

Languages of the Web• HTML • CSS • SVG • JavaScript

- Versions of Javascript: ES6, ES2015, ES2017… - Specific frameworks: react, jQuery, bootstrap, D3

�51D. Koop, DSC 530, Spring 2019