Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and...
Transcript of Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and...
![Page 1: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/1.jpg)
Exploring and Visualizing Data: Techniques for a clearer presentation of data
Brian VegetabileUCI Statistics PhD StudentNovember 17th, 2015
1
![Page 2: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/2.jpg)
Outline
• A Case for Data Exploration & Visualization
• Exploring & Visualizing a Single Variable
• Comparing Distributions of Data
• The Iteration Process of Creating a Graphic
• Data & Image Sources
2
![Page 3: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/3.jpg)
A Case for Considering Data Exploration and Visualization
3
![Page 4: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/4.jpg)
A Case for Considering Data Visualization
• Graphics can be useful to aid the presentation of technical data in the sciences
• Sometimes though they are created without thought to the perception of the reader
• A misuse of graphics can often times lead to vital information in the data being missed by both an analyst, as well as a potential reader
• Also as a reader, it is your responsibility to be able to look for inconsistencies between technical graphics and conclusions within text
4
![Page 5: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/5.jpg)
Example Graphics: Two Ways of Looking at Sunspots (1)
1700 1750 1800 1850 1900 1950 2000
050
100
150
200
250
Yearly Sunspot Totals
Year
Suns
pot N
umbe
rs
• Most standard graphics packages create plots that are squares
• ‘Squishes’ the information in the plot leaving information lost to the reader
• Fails to communicate a key piece of information to the reader
5
![Page 6: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/6.jpg)
Example Graphics: Two Ways of Looking at Sunspots (1)
• Transforming the aspect ratio of the graphics width compared to its height reveals information hidden in the previous sunspot graphic
• Sometimes called “Banking”
• Observe the steep rise in sunspot numbers and the gradual decline following a maximum.
• A consideration to how graphics are displayed can be instrumental in communicating the maximum amount of information to a reader
1700 1750 1800 1850 1900 1950 2000
010
025
0
Yearly Sunspot Totals
Year
Suns
pot N
umbe
rs
6
![Page 7: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/7.jpg)
Example Graphics: Perception of the difference in Curves
• Another example of how information can be lost in the graphing process is the difference between curves
• The distance between the curves on the right appears to greatly decrease as we increase in the independent variable
0 1 2 3 4 5
020
4060
8010
012
0
Inependent Variable
Res
pons
e Va
riabl
e
7
![Page 8: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/8.jpg)
Example Graphics: Perception of the difference in Curves
• Once we add another graphic that captures the differences between the curves, we see that the difference is almost constant!
• Considering all possible presentations of your data is crucial for not only your understanding of the data, but your readers
0 1 2 3 4 5
020
4060
8010
012
0Inependent Variable
Res
pons
e Va
riabl
es
0 1 2 3 4 5
1315
17
Inependent VariableDiff
eren
ce in
Res
pons
e Va
riabl
es
8
![Page 9: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/9.jpg)
Example Graphics: Space Shuttle Challenger Analysis (2)
• January 27, 1986, the night before the space shuttle Challenger accident
• Three-hour teleconference among people at Morton Thiokol, Marshall Space Flight Center and Kennedy Space Center.
• The discussion focused on the forecast of a 31°F temperature for launch time the next morning, and the effect of low temperature on O-ring performance.
50 60 70 800.
01.
02.
03.
0
Space Shuttle Incidents vs. TemperaturePrior to Challenger
Calculated Joint Temperature (F)
Num
ber o
f Inc
iden
ts
9
![Page 10: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/10.jpg)
Example Graphics: Space Shuttle Challenger Analysis (2)
• The engineers had only presented the failures and not the successes
• Based on the U configuration of points, it was concluded that there was no evidence from the historical data about a temperature effect.
50 60 70 800.
01.
02.
03.
0
Space Shuttle Incidents vs. TemperaturePrior to Challenger
Calculated Joint Temperature (F)
Num
ber o
f Inc
iden
ts
10
![Page 11: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/11.jpg)
Example Graphics: Space Shuttle Challenger Analysis (2)
• Adding the successes to the graphic we observe a temperature dependence between incidents and joint temperature
• The Rogers Commission concluded that "A careful analysis of the flight history of O-ring performance would have revealed the correlation of O-ring damage in low temperature"
50 60 70 800.
01.
02.
03.
0
Space Shuttle Incidents vs. TemperaturePrior to Challenger
Calculated Joint Temperature (F)
Num
ber o
f Inc
iden
ts
11
![Page 12: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/12.jpg)
Example graphic: Typical Graphic from Science
• Pick up any issue of Science Magazine and you’ll find graphics similar to the one on the right.
• “…Data are means ± SEM of seven to eight mice per genotype for (B) and six mice per genotype for (C). Statistical significance was analyzed by unpaired two-tailed t test. *P < 0.05”
• This graphic is confusing since it represents the data by a “bar chart”, but the data is not categorical.
12
![Page 13: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/13.jpg)
Exploring and Describing the Distribution of a Single Continuous Variable - Variables of One Dimension
13
![Page 14: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/14.jpg)
Visualization of a Single Continuous Variable
• Visualizing a single variable is helpful in understanding the distribution of the data.
• Reveals insights beyond summary tables.
• See mean, median, mode, quantiles, etc.
• Many statistical tests assume certain distributions for the process that generated the data
• Students t-Test
• Presented are techniques for assessing the distribution of a variable to aid in its summary
• Note: 100 points were simulated randomly to highlight these cases
14
![Page 15: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/15.jpg)
Dynamite Plots for a Single Variable
• Dynamite plots are rampant throughout the sciences.
• Plotted is a dynamite plot of the simulated data
• Shows the mean as a measure of central tendency and an error bar that is standard deviation past mean.
• These plots obscure major information that is hiding within the data!
Dynamite Plot forDistribution of Data
Value
0.0
0.5
1.0
1.5
15
![Page 16: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/16.jpg)
Box & Whisker Plots
• To the right is a Box & Whisker plot of 100 simulated data points.
• Introduced by John Tukey in his toolkit of exploratory data analysis
• Useful for beginning to understand the data, or to supplement another plot (dot plot or histogram)
• Some packages will also highlight any outliers
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
Distribution of a Variable
x−value
16
![Page 17: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/17.jpg)
Dot plots
• Each data point is plotted along a line.
• Spread and distribution of points are now more obvious.
• Plotted with a measure of central tendency.
●● ●●● ●● ●●● ●● ●● ●●● ●●●● ● ●●●● ● ●● ● ●●● ●● ●● ●●●● ● ●● ●●●● ●● ●● ●●●●●●● ● ●● ●● ●●●● ● ● ●●●● ● ●●● ●●● ●● ●● ● ●● ●● ● ●●●● ●● ●● ●
Distribution of a Variable
x−value
−1.0 −0.5 0.0 0.5 1.0 1.5 2.017
![Page 18: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/18.jpg)
Dot plots
• Key Concept: Central Tendency
• A central tendency is a central or typical value for a probability distribution.
• Included in the graphic is a ‘red’ line that shows the median
• The median is a more stable measure of central tendency than the mean and is less likely to be influenced by skew within the distribution of data.
●● ●●● ●● ●●● ●● ●● ●●● ●●●● ● ●●●● ● ●● ● ●●● ●● ●● ●●●● ● ●● ●●●● ●● ●● ●●●●●●● ● ●● ●● ●●●● ● ● ●●●● ● ●●● ●●● ●● ●● ● ●● ●● ● ●●●● ●● ●● ●
Distribution of a Variable
x−value
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
18
![Page 19: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/19.jpg)
Dot plots - Adjusting the Alpha Level
• Adjusting the alpha level amounts to changing how transparent each data point is
• Adds a level of “depth” to the graphic
• The plot below has an alpha level set to 0.5
• Darker areas have more points than lighter areas
Distribution of a Variable
x−value
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
19
![Page 20: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/20.jpg)
Dot plots - Adding Jitter
• Adding jitter amounts to adding random noise to where each data point lies on its line
• Combined with adjusting the alpha level we have a better idea of the distribution of our data points
Distribution of a Variable
x−value
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
20
![Page 21: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/21.jpg)
Histograms
• Histograms reveal even more information than the previous two!
• Simulated data was actually multi-modal
• Note: When using histograms it’s also necessary to consider bin width
Distribution of a Variable
x−value
Frequency
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
05
1015
21
![Page 22: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/22.jpg)
Histograms - Comparing bin widthsBin width +− 0.05
x−value
Frequency
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
01
23
45
6
Bin width +− 0.1
x−value
Frequency
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
02
46
810
Bin width +− 0.2
x−value
Frequency
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
05
1015
Bin width +− 0.5
x−value
Frequency
−1 0 1 2
05
1015
2025
Bin width +− 1
x−value
Frequency
−2 −1 0 1 2 3
010
2030
40
Bin width +− 5
x−value
Frequency
−6 −4 −2 0 2 4
020
4060
80100
22
![Page 23: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/23.jpg)
Combining plots
• Combining plots sometimes tells a clearer picture
• Shows modality, total number of points and relative five number summary
Utilizing Three Plots
Frequency
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
05
1015
23
![Page 24: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/24.jpg)
Quantile Plots - Normal QQ-Plot
• Quantile-Quantile Plots are both simple and powerful
• Many statistical tests require that the data being tested were generated by a Normal Distribution.
• Normal QQ-Plots offer a way to visualize the quantiles of a sample to the theoretical quantiles of a normal distribution
−2 −1 0 1 2−2
−10
12
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
24
![Page 25: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/25.jpg)
Quantile Plots - QQ-Plot
• What does the sampled data look like compared with a normal distribution?
• As expected, the multi-modal data does not compare well against the normal distribution.
• This is another plot to understand the distributional characteristics of the observed data
−2 −1 0 1 2−1
.00.
00.
51.
01.
52.
0
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
25
![Page 26: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/26.jpg)
Logarithmic Transformation of a Distribution
• Again many tests assume that data is Normally distributed as an assumption of the test
• Many types of data though aren’t naturally normal on their original scale.
• It’s sometimes necessary to transform the data to a new scale that preserves the order of the data, but where it is now normally distributed
• Data such as salaries and non-negative data often can be natural datasets to transform
26
141 Major North American River LengthsObtained by USGS
River Length
Freq
uenc
y
0 1000 2000 3000 4000
010
3050
log(River Length)
Freq
uenc
y
3 4 5 6 7 8 90
510
2030
![Page 27: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/27.jpg)
Visualizing the Distribution of a Single Continuous Variable - Variables with More Dimensions
27
![Page 28: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/28.jpg)
Scatterplots
• Scatter plots are essentially an analog to dot plots in multiple dimensions
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
−4 −2 0 2 4
−3−2
−10
12
3
Dimension 1
Dim
ensi
on 2
● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
● ●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●● ●
●
●●
●
● ● ●●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●●
● ●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
28
![Page 29: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/29.jpg)
Scatterplots
• Similar to dot plots, adjusting alpha reveals a ‘depth’ of points
−4 −2 0 2 4
−3−2
−10
12
3
Alpha: 0.25
Dimension 1
Dim
ensi
on 2
−4 −2 0 2 4
−3−2
−10
12
3
Alpha: 0.5
Dimension 1
Dim
ensi
on 2
−4 −2 0 2 4
−3−2
−10
12
3
Alpha: 0.75
Dimension 1
Dim
ensi
on 2
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−4 −2 0 2 4
−3−2
−10
12
3Alpha: 1
Dimension 1
Dim
ensi
on 2
● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
● ●
●
● ●●●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
29
![Page 30: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/30.jpg)
Scatterplots with Histograms
• These can be combined with additional plots to make the picture more clear
Freq
uenc
y
−6 −4 −2 0 2 4 6
020
4060
8010
012
0
−6 −4 −2 0 2 4 6
−3−2
−10
12
3
Dimension 1
Dim
ensi
on 2
Frequency
0 50 100 150 200 250 300
−3−2
−10
12
3
30
![Page 31: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/31.jpg)
More Dimensions —> Pairs Plots
• As dimensions of a variable get larger, combining scatter plots and histograms in pair plots can have a great effect
Variable 1
Freq
uenc
y
−2 0 2 4 6
020
4060
8010
0
−5 0 5
−20
24
6
Variable 2
Varia
ble
1
0 1 2 3 4 5 6 7
−20
24
6
Variable 3
Varia
ble
1
0 2 4 6
−20
24
6
Variable 4
Varia
ble
1
Variable 2
Freq
uenc
y
−10 −5 0 5
020
4060
8010
0
0 1 2 3 4 5 6 7
−50
5
Variable 3
Varia
ble
2
0 2 4 6
−50
5
Variable 4
Varia
ble
2
Variable 3
Freq
uenc
y
0 2 4 6
050
100
150
0 2 4 6
01
23
45
67
Variable 4
Varia
ble
3
Variable 4
Freq
uenc
y
0 2 4 6 8
050
100
150
31
![Page 32: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/32.jpg)
Visualizing Categorical Variables
32
![Page 33: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/33.jpg)
Categorical Data
• Categorical Data is often represented as a table of quantities.
• MLB National League East Rankings as of July 26th, 2015
Team Wins Losses Percentages
Washington Nationals 52 45 0.5360825
New York Mets 51 48 0.5151515
Atlanta Braves 46 52 0.4693878
Miami Marlins 41 58 0.4141414
Philadelphia Phillies 37 63 0.370000033
![Page 34: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/34.jpg)
Categorical Data - Pie Charts
• Many people interested in data visualization will tell you to never to use pie charts…
• Often used to show “Percent of the Whole”
• … but relative scale between variables is often lost
Washington NationalsNew York Mets
Atlanta Braves
Miami Marlins
Philadelphia Phillies
34
![Page 35: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/35.jpg)
Categorical Data - Bar Charts
• One method to remedy this is to observe the data as a bar chart.
• Relative win percentage is now more clear.
• Nationals doing much better than the Phillies.
Was
hing
ton
Nat
iona
ls
New
Yor
k M
ets
Atla
nta
Brav
es
Mia
mi M
arlin
s
Phila
delp
hia
Philli
es
NL East Win Percentangeas of July 26th, 2015
Win
Per
cent
age
0.0
0.2
0.4
0.6
0.8
1.0
35
![Page 36: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/36.jpg)
Categorical Variables - Dot and Line Charts
• Changing to a ‘dot a line plot’ yields more information. We see the relative amounts of wins compared with losses across the league.
NL East Standings as of July 26, 2015
●●
●●
●●
●●
●●
−65 −55 −45 −35 −25 −15 −5 5 15 25 35 45 55
Losses Wins
Philadelphia Phillies
Miami Marlins
Atlanta Braves
New York Mets
Washington Nationals
36
![Page 37: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/37.jpg)
Comparing Distributions
37
![Page 38: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/38.jpg)
Comparing Distributions
• Often we are interested in comparing more than one distribution.
• Simulated are 1000 draws from 3 separate beta distributions
Distribution 1
X
Density
0.00 0.05 0.10 0.15 0.20 0.25 0.30
02
46
810
12
Distribution 2
X
Density
0.1 0.2 0.3 0.4
02
46
Distribution 3
X
Density
0.60 0.62 0.64 0.66 0.68 0.70 0.720
510
1520
25
38
![Page 39: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/39.jpg)
Comparing Distributions - Common Scale
• Adjusting to a common scale for each distribution allows us to see relative spreads, relative centers, etc.
Distribution 1
X
Density
0.0 0.2 0.4 0.6 0.8 1.0
04
812
Distribution 2
X
Density
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
Distribution 3
X
Density
0.0 0.2 0.4 0.6 0.8 1.0
05
15
39
![Page 40: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/40.jpg)
Comparing Distributions - Common Plot
• Finally moving to a common plot we see how the densities compare with each other on two common scales
All 3 Distributions
X
Density
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
20
40
![Page 41: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/41.jpg)
Comparing Distributions
• This can be even more dramatic in more dimensions
9 10 11 12 13 14 15
1213
1415
1617
18
Distribution 1
X1
Y 1
0 2 4 6 8 10
510
1520
25
Distribution 2
X2
Y 2
41
![Page 42: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/42.jpg)
Comparing Distributions - Common Scale
• Common scales allow us to see the relative sizes of the distributions
0 5 10 15
510
1520
25
Distribution 1
X1
Y 1
0 5 10 15
510
1520
25
Distribution 2
X2
Y 2
42
![Page 43: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/43.jpg)
Comparing Distributions - Common Plot
• And with a common plot we can see the relative distance between each center and assess overlap
0 5 10 15
510
1520
25
Distribution 1 vs. Distribution 2
X
Y
43
![Page 44: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/44.jpg)
The Iterative Process of Creating a Graphic
44
![Page 45: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/45.jpg)
Exploring data: Stepping through the Process
• Data simulated as an illustration using the following study:
• “Maternal exposure to childhood trauma is associated during pregnancy with placental-fetal stress physiology, Biological Psychiatry (to apprear)”[3]
• Goal: Examine the hypothesis that intergenerational transmission may begin during intrauterine life via the effect of maternal childhood trauma exposure on placental-fetal stress physiology, specifically placental corticotrophin-releasing hormone (pCRH).
• Interested in examining the effects of childhood trauma exposure on placental corticotrophin-releasing hormone production over gestational age.
• This simulated data will help demonstrate the iterative design process of a graphic
45
![Page 46: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/46.jpg)
Describing the data
• The simulated data is of “sociodemographically-diverse cohort of 88 pregnant women.”
• Placental CRH concentrations were quantified in maternal blood collected serially over the course of gestation.
46
![Page 47: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/47.jpg)
What does the data look like?
• What does the relationship between pCRH and gestational age look like prior to taking into considering treatment effects or individual effects?
• We are interested in understanding the general effect of pCRH across gestational age.
15 20 25 30 35 400
400
800
1200
Relationship between Gestational Age and pCRH
Gestational Age
pCR
H
47
![Page 48: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/48.jpg)
Transforming the Response
• From the last plot we notice an exponential relationship
• It’s often of interest to see if this relationship is linear on a logarithmic scale in order to perform linear regression
• We’ve plotted a transformed log(pCRH) to the right
• Notice that there is a clear linear relationship on this scale
15 20 25 30 35 403
45
67
Relationship between Gestational Age and log(pCRH)
Gestational Age
log(
pCR
H)
48
![Page 49: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/49.jpg)
Is there a difference between the groups?
• We can now begin to explore differences between in the production of pCRH across gestational age in those who had experienced childhood trauma and those that did not.
• …. it doesn’t look like there’s much of a difference.
• Let’s investigate the possibility that the variability in slopes is different between the two groups?
15 20 25 30 35 40
23
45
67
8
Experienced Childhood Trauma
Gestational Age
log(
pCR
H)
15 20 25 30 35 40
23
45
67
8
Did Not Experience Childhood Trauma
Gestational Agelo
g(pC
RH
)
49
![Page 50: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/50.jpg)
Do the individual trajectories vary between the groups?
• Adding lines between the points for the individual trajectories allows us to see if there is variability between the two groups
• ….again, it doesn’t look like there’s much of a difference.
• It appears that we’ve got 5 different collection phases across gestational age. What if we bin these together and investigate that way?
15 20 25 30 35 40
23
45
67
8
Experienced Childhood Trauma
Gestational Age
log(
pCR
H)
15 20 25 30 35 40
23
45
67
8
Did Not Experience Childhood Trauma
Gestational Agelo
g(pC
RH
)
50
![Page 51: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/51.jpg)
Grouping by Week Clusters?
• We’ve created ‘groups’ by their week clusters
• Now we can look at the distribution of points within each cluster.
• It’s hard to tell if there is a difference between these two plots with them plotted this way
• Let’s add them back to the same plot for a side by side comparison!
●●
●●
34
56
7
Experienced Childhood Trauma
Gestational Age Grouped Every Five Weeks
log(
pCR
H)
<20 20−25 25−30
●
●
34
56
7
Did Not Experience Childhood Trauma
Gestational Age Grouped Every Five Weekslo
g(pC
RH
)<20 20−25 25−30
51
![Page 52: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/52.jpg)
Side by Side Distributions
• Comparing the Distribution at each ‘week’ tells us a lot more information
• We now see that the median pCRH for those who experienced childhood trauma is lower than those who did not experience trauma across gestational age
• We also see that the differences between the medians gets smaller across gestational age, suggesting an interaction between gestational age and pCRH.
• Now let’s tell the whole story!
●
●●
●●
●
34
56
7
Comparison of log(pCRH) Across Trauma
Gestational Age Grouped Every Five Weeks
log(
pCR
H)
<20 20−25 25−30 30−35 35−40
Did Not Experience TraumaExperienced Trauma
52
![Page 53: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/53.jpg)
Telling the Whole Story: A Completed Graphic
• We can now take the graphics that we’ve created through the exploratory phase and construct a combined graphic to tell the whole story
• The two left most graphics highlight the individual trajectories, while the last graphic captures the temporal change in the relationship
●●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
15 20 25 30 35 40
23
45
67
8
Experienced Childhood Trauma
Gestational Age
log(
pCR
H)
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
15 20 25 30 35 40
23
45
67
8
Did Not Experience Childhood Trauma
Gestational Age
log(
pCR
H)
●
●●
●●
●
34
56
7
Comparison of log(pCRH) Across Trauma
Gestational Age Grouped Every Five Weeks
log(
pCR
H)
<20 20−25 25−30 30−35 35−40
Did Not Experience TraumaExperienced Trauma
53
![Page 54: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/54.jpg)
Outlining the General Strategy for the Creation of Graphics
• It’s necessary to explore your data to fully understand how it’s behaving
• The goal is to pack a large amount of quantitative information into a small region.
• Consider how a reader would perceive the graphic that you’ve presented.
• Combine graphics when needed to tell the entire story.
• Carefully study the domain area and understand when it is necessary to further investigate the data
• Graphing data should be an iterative, experimental process
54
![Page 55: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/55.jpg)
Further Investigation
• Multidimensional Visualization techniques
• Visualizing Categorical Variables
• Visualization Techniques for combining Categorical and Continuous Variables
• Loess Smoothing for Scatter Plots
• Techniques for Time Series Data
• Techniques for Spatial Data55
![Page 56: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/56.jpg)
Texts/References
• Texts
• The Elements of Graphing Data - William S. Cleveland
• Visualizing Data - William S. Cleveland
• The Visual Display of Quantitative Information - Edward Tufte
• Articles
• Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Models - William Cleveland and Robert McGill
• Let’s Practice What We Preach: Turning Tables into Graphs - Andrew Gelman, Cristian Pasarica, and Rahul Dodhia
56
![Page 57: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/57.jpg)
References
1. Cleveland, William S. The Elements of Graphing Data. Murray Hill, NJ: AT & T Bell Laboratories, 1994. Print.
2. Siddhartha R. Dalal , Edward B. Fowlkes & Bruce Hoadley (1989) Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure, Journal of the American Statistical Association, 84:408, 945-957, DOI: 10.1080/01621459.1989.10478858
3. Moog, N.K,, Buss, C., Entringer, S., Shahbaba, V., Gillen, D., Hobel, C.J., and Wadhwa, P.D. (2015), Maternal exposure to childhood trauma is associated during pregnancy with placental-fetal stress physiology, Biological Psychiatry (to apprear).
57
![Page 58: Exploring and Visualizing Data: Techniques for a clearer presentation ...€¦ · Exploring and Visualizing Data: Techniques for a clearer presentation of data Brian Vegetabile UCI](https://reader036.fdocuments.us/reader036/viewer/2022081618/60a22befa618d06d7340cc90/html5/thumbnails/58.jpg)
Data & Image Sources
• Image - Flight Patterns - http://users.design.ucla.edu/~akoblin/work/faa/
• Data - Sunspots - WDC-SILSO, Royal Observatory of Belgium, Brussels (http://www.sidc.be/silso/datafiles)
58