Organizing and Displaying Data. Data Files Data is almost always stored in a format where: ROWS are...
-
Upload
marvin-floyd -
Category
Documents
-
view
214 -
download
0
Transcript of Organizing and Displaying Data. Data Files Data is almost always stored in a format where: ROWS are...
Organizing and Displaying Data
Data FilesData is almost always stored in a format where:
ROWS are cases or individuals
and
COLUMNS are variables
SYS- DIA- OUT-ID EJEC VOL VOL OCCLU STEN TIME COME AGE SMOKE BETA CHOLa SURG
390 72 36 131 0 0 143 0 49 2 2 59 0279 52 74 155 37 63 143 0 54 2 2 68 1391 62 52 137 33 47 16 2 56 2 2 52 0201 50 165 329 33 30 143 0 42 2 2 39 0202 50 47 95 0 100 143 0 46 2 2 74 169 27 124 170 77 23 143 0 57 2 2 NA 2
310 60 86 215 7 50 40 0 51 2 2 58 0392 72 37 132 40 10 9 5 56 2 2 75 0311 60 65 163 0 40 142 0 45 2 2 72 0393 63 52 140 0 10 142 0 46 2 2 90 070 29 117 164 50 0 142 0 48 2 2 72 0
203 48 69 133 0 27 142 0 54 2 2 NA 0394 59 54 133 30 13 142 0 39 2 1 NA 0204 50 67 135 37 63 141 0 49 2 2 86 2280 53 65 138 0 33 140 0 58 2 1 49 055 17 184 221 57 13 5 1 50 2 2 70 279 37 88 140 37 47 118 5 58 2 2 NA 0
205 45 106 193 33 43 140 0 47 1 1 38 1206 43 85 150 0 50 23 5 51 2 2 61 0312 60 59 149 7 37 139 0 43 2 1 56 080 38 103 168 47 43 100 1 55 2 2 62 1
281 57 53 124 0 57 140 0 58 2 1 93 0207 44 68 121 27 60 139 0 55 2 2 63 1282 51 53 109 0 77 139 0 41 2 2 45 4396 63 58 157 0 73 139 0 51 2 2 60 0208 49 81 157 13 13 139 0 49 2 2 60 0209 48 58 112 0 0 72 1 56 2 2 57 0283 58 71 167 27 0 138 0 45 2 1 46 0210 42 92 159 0 0 139 0 57 2 2 58 0397 68 50 156 0 100 138 0 51 2 1 NA 0211 43 146 259 47 33 3 1 56 2 2 70 0398 67 43 130 0 70 138 0 49 2 2 NA 3284 52 70 146 0 23 137 0 47 1 2 NA 0399 63 73 195 27 0 136 0 36 1 1 61 0285 54 62 133 33 23 137 0 38 2 2 NA 071 37 93 148 47 0 137 0 59 2 2 NA 0
286 51 65 133 43 7 136 0 54 2 2 NA 0212 42 95 163 40 10 109 3 57 2 2 NA 4400 66 49 144 10 50 65 1 52 2 2 55 0287 54 66 145 7 40 136 0 47 2 2 62 081 39 144 237 13 87 136 0 39 2 2 56 3
813 63 52 141 0 47 43 3 48 2 2 NA 068 30 219 314 33 45 76 1 53 1 2 NA 0
288 59 39 94 0 0 135 0 47 1 2 63 0407 67 39 117 0 73 53 1 57 2 2 62 2
a
Complete Data Table on Male Heart Attack Patients
Portion of the Data Table on Male Heart Attack Patients
Row 1
Variable values for subject #390
SYS- DIA- OUT-ID EJEC VOL VOL OCCLU STEN TIME COME AGE SMOKE BETA CHOL SURG
390 72 36 131 0 0 143 0 49 2 2 59 0
279 52 74 155 37 63 143 0 54 2 2 68 1
391 62 52 137 33 47 16 2 56 2 2 52 0
201 50 165 329 33 30 143 0 42 2 2 39 0
202 50 47 95 0 100 143 0 46 2 2 74 1
69 27 124 170 77 23 143 0 57 2 2 NA 2
310 60 86 215 7 50 40 0 51 2 2 58 0
392 72 37 132 40 10 9 5 56 2 2 75 0
311 60 65 163 0 40 142 0 45 2 2 72 0
393 63 52 140 0 10 142 0 46 2 2 90 0
Column 3
Systolic volume for the first 10 subjects
VariablesID: A patient identifier instead of a
name.
EJEC: Ejection fraction, % of blood ejected from left ventricle in one beat.
SYSVOL: End-systolic volume, a measure of the size of the heart.
DIAVOL: End-diastolic volume.
OCCLU: Occlusion score (% of myocardium of the left ventricle supplied by arteries that are totally blocked).
VariablesSTEN: Stenosis score (% supplied by
arteries that are significantly narrowed but not completely blocked).
TIME: Time in months from when patient was admitted until OUTCOME.
OUTCOME: Coded variable
0 = alive at last follow up
1 = sudden cardiac death
2 = death within 30 days of heart attack
3 = death from heart failure
4 = death during / after coronary surgery
5 = non-cardiac death
VariablesSMOKE: Coded variable
1 = patient continued to smoke 2 = patient did not continue smoking
BETA: Coded variable 1 = patient took beta blockers 2 = patient did not take beta blockers
AGE: Patient’s age at admission (years)
CHOL: Blood cholesterol (mmoles/litre)
SURG: Coded variable 0 = no surgery 1 = surgery as part of a trial 2 = surgery for symptoms within 1 year 3 = surgery for symptoms within 1 to 5
years 4 = surgery for symptoms after 5 years
In this presentation we will look at…
Tools to help us:• explore
search for important features / messages• communicate
report the important features/messages
Two types of tools:• visual summaries
plots, graphs, charts, etc.• numerical summaries
center, spread, percentages, frequencies, etc.
Types of Variables
A quantitative or numeric variable measures or counts something.
e.g. height of a student, number of sisters
A qualitative or categorical or nominal variable defines group membership.
e.g. gender, ethnicity
Quantitative/Numeric VariablesContinuous variables have no gaps between possible values. (measurements)e.g. weight, temperature
Discrete variables have gaps between possible values. (counts)e.g. number of brothers
Variables with few repeated values are treated as continuous.Variables with many repeated values are treated as discrete.
Qualitative (Categorical) Variables
A categorical or nominal variable is one that has no order.
e.g. ethnicity, gender
An ordinal variable is one where the categories can be ordered.
e.g. income group (low, middle, high); age group (young, old)
Likert scale, e.g. (1 = strong disagree,…, 5 = strongly agree)
Types of Variables
Quantitative
(measurements and counts)
Qualitative
(define groups)
Continuous (few repeated
values)
Discrete (many repeated
values)
Categorical/Nominal (no idea of order)
Ordinal (fall in natural
order)
SYS- DIA- OUT-ID EJEC VOL VOL OCCLU STEN TIME COME AGE SMOKE BETA CHOLa SURG
390 72 36 131 0 0 143 0 49 2 2 59 0279 52 74 155 37 63 143 0 54 2 2 68 1391 62 52 137 33 47 16 2 56 2 2 52 0201 50 165 329 33 30 143 0 42 2 2 39 0202 50 47 95 0 100 143 0 46 2 2 74 169 27 124 170 77 23 143 0 57 2 2 NA 2
310 60 86 215 7 50 40 0 51 2 2 58 0392 72 37 132 40 10 9 5 56 2 2 75 0311 60 65 163 0 40 142 0 45 2 2 72 0393 63 52 140 0 10 142 0 46 2 2 90 070 29 117 164 50 0 142 0 48 2 2 72 0
203 48 69 133 0 27 142 0 54 2 2 NA 0394 59 54 133 30 13 142 0 39 2 1 NA 0204 50 67 135 37 63 141 0 49 2 2 86 2280 53 65 138 0 33 140 0 58 2 1 49 055 17 184 221 57 13 5 1 50 2 2 70 279 37 88 140 37 47 118 5 58 2 2 NA 0
205 45 106 193 33 43 140 0 47 1 1 38 1206 43 85 150 0 50 23 5 51 2 2 61 0312 60 59 149 7 37 139 0 43 2 1 56 080 38 103 168 47 43 100 1 55 2 2 62 1
281 57 53 124 0 57 140 0 58 2 1 93 0207 44 68 121 27 60 139 0 55 2 2 63 1282 51 53 109 0 77 139 0 41 2 2 45 4396 63 58 157 0 73 139 0 51 2 2 60 0208 49 81 157 13 13 139 0 49 2 2 60 0209 48 58 112 0 0 72 1 56 2 2 57 0283 58 71 167 27 0 138 0 45 2 1 46 0210 42 92 159 0 0 139 0 57 2 2 58 0397 68 50 156 0 100 138 0 51 2 1 NA 0211 43 146 259 47 33 3 1 56 2 2 70 0398 67 43 130 0 70 138 0 49 2 2 NA 3284 52 70 146 0 23 137 0 47 1 2 NA 0399 63 73 195 27 0 136 0 36 1 1 61 0285 54 62 133 33 23 137 0 38 2 2 NA 071 37 93 148 47 0 137 0 59 2 2 NA 0
286 51 65 133 43 7 136 0 54 2 2 NA 0212 42 95 163 40 10 109 3 57 2 2 NA 4400 66 49 144 10 50 65 1 52 2 2 55 0287 54 66 145 7 40 136 0 47 2 2 62 081 39 144 237 13 87 136 0 39 2 2 56 3
813 63 52 141 0 47 43 3 48 2 2 NA 068 30 219 314 33 45 76 1 53 1 2 NA 0
288 59 39 94 0 0 135 0 47 1 2 63 0407 67 39 117 0 73 53 1 57 2 2 62 2
a
Complete Data Table on Male Heart Attack Patients
Heart Attack Data in JMP
VariablesID: A patient identifier instead of a
name.
EJEC: Ejection fraction, % of blood ejected from left ventricle in one beat.
SYSVOL: End-systolic volume, a measure of the size of the heart.
DIAVOL: End-diastolic volume.
OCCLU: Occlusion score (% of myocardium of the left ventricle supplied by arteries that are totally blocked).
Classify each variable according to its type.
N = nominal, O = ordinal, C = continuous/discrete
N
C
C
C
C
VariablesSTEN: Stenosis score (% supplied by
arteries that are significantly narrowed but not completely blocked).
TIME: Time in months from when patient was admitted until OUTCOME.
OUTCOME: Coded variable
0 = alive at last follow up
1 = sudden cardiac death
2 = death within 30 days of heart attack
3 = death from heart failure
4 = death during / after coronary surgery
5 = non-cardiac death
Classify each variable by type
(C or O or N)
C
C
N
VariablesSMOKE: Coded variable
1 = patient continued to smoke 2 = patient did not continue smoking
BETA: Coded variable 1 = patient took beta blockers 2 = patient did not take beta blockers
AGE: Patient’s age at admission (years)
CHOL: Blood cholesterol (mmoles/litre)
SURG: Coded variable 0 = no surgery 1 = surgery as part of a trial 2 = surgery for symptoms within 1 year 3 = surgery for symptoms within 1 to 5
years 4 = surgery for symptoms after 5 years
Classify each variable by type
(C or O or N)
N
N
C
C
N O?
Data Types in JMP
Reporting Findings in Tables 1. Don’t try to do too much in the table. Model
tables off of published research.
2. Use white space effectively.
3. Make sure tables and text refer to each other, however you do not need to write everything in table as text. If you interpret one or two key findings in a table, the reader should be able to handle the rest.
4. Use some aspect of the data to order and group rows/columns in table, e.g. size, chronology, or to show similarity or invite comparisons.
Reporting Findings in Tables Example: Exercise 3 Grove
Comparisons between the Intervention group and the Control group are the focus here. The P column contains p-values from an appropriate test comparing the two groups on the given variables.
Reporting Findings in Tables
5. If appropriate, frame the table with summary statistics in rows and columns to provide a standard of comparison.
6. It is useful to round numbers in a table to one or two decimal places.
Example 2The three tables below show six-monthly circulation figures for six weekly magazines in New Zealand.
Table 1: Circulation of Weekly Magazines
We want to compare
New Idea Listener Woman’s Day
Woman’s Weekly
Time TV Guide
Jan 1 to Jun 30, 1999 67,070 90,521 165,914 126,640 38,136 241,356
Jul 1 to Dec 31, 1998 63,444 90,018 162,182 126,486 38,236 248,786
Jan 1 to Jun 30, 1998 59,039 92,786 175,002 129,920 38,635 258,806
circulation figures betweenmagazines.
It is easier to make circulation comparisons whenthe circulation data are in columns.
Numbers need to be
Jan 1 to Jun 30, 1999 Jul 1 to Dec 31, 1998 Jan 1 to Jun 30, 1998
New Idea 67,070 63,444 59,039
Listener 90,521 90,018 92,786
Woman’s Day 165,914 162,182 175,002
Woman’s Weekly 126,640 126,486 129,920
Time 38,136 38,236 38,635
TV Guide 241,356 248,786 258,806
Table 2: Circulation of Weekly Magazines
Example 1
rounded.
Magazines need to be ordered by circulation.
Jan 1 to Jun 30,
1998
Jul 1 to Dec 31,
1998
Jan 1 to Jun 30,
1999
Average
TV Guide 259 249 241 250
Woman’s Day 175 162 166 168
Woman’s Weekly 130 126 127 128
Listener 93 90 91 91
New Idea 59 63 67 63
Time 39 38 38 38
Table 3: Circulation of Weekly Magazines (in thousands)
Row averages allow comparisons between the most recent circulation data and the average for the magazine.
Example 1
Jan 1 to Jun 30,
1998
Jul 1 to Dec 31,
1998
Jan 1 to Jun 30,
1999
Average
TV Guide 259 249 241 250
Woman’s Day 175 162 166 168
Woman’s Weekly 130 126 127 128
Listener 93 90 91 91
New Idea 59 63 67 63
Time 39 38 38 38
Average 126 122 122
Table 3: Circulation of Weekly Magazines (in thousands)
Column averages allow comparisons between the circulation data and the average for the 6 magazines for the time period.
Example 1
Jan 1 to Jun 30,
1998
Jul 1 to Dec 31,
1998
Jan 1 to Jun 30,
1999
Average
TV Guide 259 249 241 250
Woman’s Day 175 162 166 168
Woman’s Weekly 130 126 127 128
Listener 93 90 91 91
New Idea 59 63 67 63
Time 39 38 38 38
Average 126 122 122
Table 3: Circulation of Weekly Magazines (in thousands)
Verbal Summary: During 1998 and the first-half of 1999 theTV Guide had the highest circulation for weekly magazines in New Zealand.
Example 1
Univariate Analyses
Variable type dictates how we display and summarize the distribution.
For nominal or ordinal data the notion of distribution is typically the percentage of observations falling into each of the categories or ordered levels.
For numeric data distribution refers shape of the distribution, central tendency or “average”, and variability or spread.
Types of Variables
Quantitative
(measurements and counts)
Qualitative
(define groups)
Continuous (few repeated
values)
Discrete (many repeated
values)
Categorical (no idea of
order)
Ordinal (fall in natural
order)
Displays for Numeric Variables
• Stem-and-Leaf Plots (simple, but outdated)
• Histograms & Smooth Density Estimates
• Quantile and Outlier Boxplots
17.4 Australia
20.1 Austria
10.1 Czechoslovakia
13.0 Denmark
13.1 W. Germany
21.1 Greece
10.3 Israel
10.4 Japan
10.5 Norway
14.6 Poland
15.7 Switzerland
18.6 United States
19.9 Belgium
12.5 Bulgaria
11.6 Finland
20.0 France
5.4 Hong Kong
17.1 Hungary
26.8 Kuwait
11.3 Netherlands
25.6 Portugal
12.6 Singapore
12.1 N. Ireland
12.0 Scotland
15.8 Canada12.0 E. Germany 15.3 Ireland20.1 New Zealand 9.8 Sweden 10.1 England & Wales
Data for 1983, 1984 or 1985 depending on the country (prior to reunification of Germany)
Traffic Death-Rates (per 100,000 population) for 30 Countries
Collapse to
12 stems
Units: 17 | 4 = 17.4 deaths per 100,0005 46789 8
10 1 1 3 4 511 3 612 0 0 1 5 613 0 114 615 3 7 81617 1 418 619 920 0 1 121 122232425 626 8
Traffic Death-Rates (per 100,000 population) for 30 Countries
Units: 1 | 7 = 17 deaths per 100,0000 5001 0 0 0 0 0 1 11 2 2 2 2 3 3 31 5 51 6 6 7 71 92 0 0 0 0 1222 6 7
3
Traffic Death-Rates (per 100,000) for 30 Countries
Stem Leaf
2 67
2
2
2 00001
1 9
1 6677
1 55
1 22223333
1 0000011
0
0
0 5
Count
2
5
1
4
2
8
7
1
0|5 represents 5
Stem and Leaf
Traffic Death Rate (per 100,000)
Distributions
Stem-and-Leaf plot from JMP
Histograms
Divide range of data into equal width class intervals and use the number or percentage of observations in each class interval to determine the height of a bar centered over each interval.
Traffic Death-Rates (per 100,000)
Class Intervals % 5 – 10 6.710 – 15 50.015 – 20 23.320 – 25 13.325 – 30 6.7
Histograms
An appropriate histogram should have 5-15 intervals.
Histograms are used when the sample size is moderate to large. Use n 50 as a guide.
Strengths of histograms:Show the shape of the distribution.
Show gaps, outliers, clusters, groupings.
Histograms – Example 2Birth weights (g) of infants born to smoking and nonsmoking mothers
Weights for Infants Born to Smokers2557 2594 2600 2663 2665 2769 2769 2782 2821 2906 2920 2948 2948 2977 2977 29923005 3033 3042 3076 3076 3090 3132 3147 3203 3260 3303 3317 3321 3331 3374 34303444 3629 3637 3643 3651 3651 3756 3856 3884 3940 4238 709 1135 1790 1818 1885 1928 1928 1936 2084 2084 2125 2126 2187 2211 2225 2296 2296 2353 2367 2381 2381 2410 2410 2414 2424 2466 2466 2466 2495 2495
Weights for Infants Born to Nonsmokers2523 2551 2622 2637 2637 2722 2733 2750 2750 2778 2807 2835 2835 2836 2863 2877 2877 2920 2920 2920 2977 2977 3062 3062 3062 3080 3090 3090 3100 3104 3175 3175 3203 3203 3225 3225 3232 3232 3234 3274 3274 3317 3317 3374 3402 3416 3459 3460 3473 3475 3487 3544 3600 3614 3614 3629 3651 3651 3699 3728 3770 3770 3770 37903799 3827 3860 3860 3884 3912 3941 3941 3969 3983 3997 3997 4054 4054 4111 41534167 4174 4593 4990 1021 1330 1474 1588 1588 1701 1729 1893 1899 1928 1970 20552055 2082 2100 2187 2240 2240 2282 2301 2325 2353 2381 2395 2438 2442 2450 24952495
Histograms – Example 2
Would like to compare birth weights of infants born to mother’s who smoked during pregnancy to those who did not.
What distributional differences, if any, do you see?
Histograms – Example 3
No outliers or gaps. Two broad groupings (one group of days with little or no sun and another group of days with between 4 to 13 hours of sun).
151050
15
10
5
0
Daily sunshine (hours)
Fre
que
ncy
Auckland sunshine hours, January to April, 2000
Distributional Properties - Shape
(a) Unimodal
(d) Symmetric
(c) Trimodal(b) Bimodal
(e) Positively or Right skewed (long upper tail)
(f) Negatively or Left skewed (long lower tail)
(g) Symmetric (h) Bimodal with gap (i) Exponential shape
Distribution Properties - Outliers
• Outliers Mistakes or something
interesting/unusual.
(k) Outliers
Outlier Outlier
Distributional Properties - Modality
• Existence of more than one peakModality (unimodal, bimodal, etc).
(c) Trimodal(b) Bimodal
Distributional Properties - Skewness
• Shape of the distribution Symmetry, skewness.
(d) Symmetric
(e) Positively or right skewed (long upper tail)
(f) Negatively or left skewed (long lower tail)
Normal distribution
Distributional Properties – Central Tendency and Variability/Spread
• Central values and spreadWhat is the central value? How spread out are values about center?
A majority of infants have birth weights within 500g of what is typical.
Typical birth weight of infants born to nonsmokers is approx. 3000g.
Interpreting Stem-and-Leaf Plots and Histograms
• Be suspicious of abrupt changes
(j) Spike in pattern
Spike
Histograms – Example 4# of Cigarettes Smoked Per Day by WSU smokers
How would you characterize this distribution?
Interpreting Stem-and-Leaf Plots and Histograms
• Be suspicious of abrupt changes
(l) Truncation plus outlier
(a) Unimodal
(d) Symmetric
(c) Trimodal(b) Bimodal
(e) Positively or Right skewed (long upper tail)
(f) Negatively or Left skewed (long lower tail)
(g) Symmetric (h) Bimodal with gap (i) Exponential shape
Features to look for in histograms and stem-and-leaf plots
• Outliers • Existence of more than one peak• Shape of the distribution • Central values and spread• Be suspicious of abrupt changes
Normal Distribution
(k) Outliers
(j) Spike in pattern
(l) Truncation plus outlier
Outlier Outlier
Spike
Features to look for in histograms and stem-and-leaf plots
• Outliers • Existence of more than one peak• Shape of the distribution • Central values and spread• Be suspicious of abrupt changes
Quantile and Outlier Boxplot
Quantile and Outlier Boxplots
MedQ1 Q3
Width of box represents the IQR, the interquartile range, which is the range of the middle 50% of the data
x = sample mean
Birth weights of babies born to smoking mothers
Outlier
Quantile and Outlier Boxplots
Boxplots are useful for comparing a numeric response variable across populations.
Quantile and Outlier Boxplots
• Individual box plots can show outliers and skewness.
gives
Right skewed data
Quantile and Outlier Boxplots
• A wide box plot with short whiskers could be coming from a bimodal distribution or a very short tailed distribution.
Short whiskers
and givee
Quantile and Outlier Boxplots
All three populations in this study have right skewed mean NFCS scores, with extreme outliers. The Baseline group seems to have the lowest scores.
Which plot do I use?
Choose plots that best display the features you see in the data.
Generally look at several to see most important features of your data.
Simple Plots for Continuous Variables
Types of Variables
Quantitative
(measurements and counts)
Qualitative
(define groups)
Continuous (few repeated
values)
Discrete (many repeated
values)
Categorical (no idea of
order)
Ordinal (fall in natural
order)
• Stem-and-leaf plots• Histograms• Boxplots
Types of Variables
Quantitative
(measurements and counts)
Qualitative
(define groups)
Continuous (few repeated
values)
Discrete or Ordinal (many repeated
values)
Categorical (no idea of
order)
Ordinal (fall in natural
order)
Repeated and Grouped Data
Repeated Data (i.e. Discrete Variables)
e.g. Years of Education
# of Children
Display Tools:Frequency table, bar graph
Frequency Table
Grove, Exercise 6:
Katsma and Souza’s (2000) study are presented in
tables on pg. 36. They contain both the nurse’s
opinion regarding a patient’s self-reported pain
assessment on a 10-pt. ordinal scale and what they
actually reported in the patient’s chart. There were
two classifications of patients: smiling and
grimacing.
Frequency TableFrequency Table for Nurse’s Opinion of Patient’s Self-Reported Pain Score (smiling patients)
Pain AssessmentScale (xi)
Frequency(fi)
Percentage
(fi /n) x 100
Cumulative %
1 7 8.1 16.22 5 5.8 22.03 8 9.4 31.44 10 11.6 43.05 11 12.8 55.86 5 5.8 61.67 2 2.3 63.98 31 36.1 100.09 0 0.0 100.0
10 0 0.0 100.0
n = 86 100.0
0 7 8.1 8.1
Frequency Table
Has the columns:value xj
each distinct value in the sample
frequency fi
how often each value occurs
percentage (fi /n) x 100
percentage of sample with that value
cumulative percentage
percentage of sample with value xi or less
Frequency TableFrequency Table for Nurse’s Opinion of Patient’s Self-Reported Pain Score (smiling patients)
Pain AssessmentScale (xi)
Frequency(fi)
Percentage
(fi /n) x 100
Cumulative %
1 7 8.1 16.22 5 5.8 22.03 8 9.4 31.44 10 11.6 43.05 11 12.8 55.86 5 5.8 61.67 2 2.3 63.98 31 36.1 100.09 0 0.0 100.0
10 0 0.0 100.0
n = 86 100.0
0 7 8.1 8.1
31 of the86 nurses had
same opinion as patient.
31 / 86 100% = 36.1% of the agreed with
patient’s score
100.0% of the nurse’s felt the pain score was
at or below patient’s score.
Bar Graph
Similar to histogram (for continuous data), except bars / rectangles are not necessarily joined up.
Data Entered into JMP (with frequencies)
Frequency Tables & Bar Graphs
Computing
Frequency tables are produced from raw data in JMP under Analyze Distribution.
Be sure to tell JMP that the frequencies have been entered and should be interpreted as such.
Bar Graph and Frequency Table in JMP
Types of Variables
Quantitative
(measurements and counts)
Qualitative
(define groups)
Continuous (few repeated
values)
Discrete or Ordinal (many repeated
values)
Categorical (no idea of
order)
Ordinal (fall in natural
order)
Qualitative/Categorical/Nominal Variables
Display Tools:Frequency table, bar graph
Frequency Table
Used in exactly the same way as for discrete variables.
Order categories by size (i.e. by frequency unless there is some very compelling reason for some other ordering).
20%
40%
60%
SURG
0 1 2 3 40%
SURG:
0: No surgery
1: Surgery as part of trial
2: Surgery for symptomsin 1 year
3: Surgery for symptomswithin 1 to 5 years
4: Surgery for symptomsafter 5 years
Pe
rce
nta
ge
Bar Graph for the variable SURG
Qualitative Variables: Bar Graph
SURG Frequency Percentage Cumulativepercentage
No surgery performed 0 409 66.4 66.4
Surg. as part of trial 1 89 14.4 80.8
Surg. for sympt. within 1 year 2 72 11.7 92.5
Surg. for sympt. 1 to 5 years 3 29 4.7 97.2
Surg. for sympt. > 5 years 4 17 2.8 100.0
616 100.0
Categorical/Nominal Variables: Frequency Table
Frequency Table for the variable SURG
Heart Attack Data in JMP
Bar Graph for Surgery Variable in JMP
Computing
Frequency tables are produced from raw data in JMP under Analyze Distribution.
Notice that there is no frequency column in this data table, that is because the data was entered where each row represents one subject in the study.
The Big Mac Index
In 1986 The Economist started to compare prices of Big Macs between countries (converted to US dollars).
This provides a measure of whether the currency is undervalued or overvalued compared to the United States dollar.
The Big Mac Index
Price of Big Macs ($US)
0
0.5
1
1.5
2
2.5
3
3.5
4Is
rael
Jap
an
Fra
nce
Tai
wan
Sin
gap
ore
New
Zea
lan
d
Ho
ng
Ko
ng
Country
Pri
ce (
$US
)
USA
More General Use of Bar Graphs
• Excellent for relating labels to relative importance or relative size.
Price of Big Macs ($US)
0
0.5
1
1.5
2
2.5
3
3.5
4
Isra
el
Jap
an
Fra
nce
Tai
wan
Sin
gap
ore
New
Zea
lan
d
Ho
ng
Ko
ng
Country
Pri
ce (
$US
)
More General Use of Bar Graphs
• Can be used to display a quantitative variable other than frequency (e.g. time, amount of money).
Price of Big Macs ($US)
0
0.5
1
1.5
2
2.5
3
3.5
4
Isra
el
Jap
an
Fra
nce
Tai
wan
Sin
gap
ore
New
Zea
lan
d
Ho
ng
Ko
ng
Country
Pri
ce (
$US
)
More General Use of Bar Graphs
• Where possible, order items by size.
Price of Big Macs ($US)
0
0.5
1
1.5
2
2.5
3
3.5
4
Isra
el
Jap
an
Fra
nce
Tai
wan
Sin
gap
ore
New
Zea
lan
d
Ho
ng
Ko
ng
Country
Pri
ce (
$US
)
Other Forms of Graphs • Pie chart (For displaying the “measurement” of
each object as a proportion of the total.)• Segmented bar graph (Same purpose as the pie
chart.) Percentages of the World's Gold Production
Country 1983 1985 1987 1989 1991
S. Africa 48.6 43.8 36.2 30.8 28.7
U.S. 4.4 5.0 9.3 13.4 13.9
USSR 19.1 17.7 16.7 14.4 11.5
Australia 2.2 3.8 6.7 10.3 11.2
Canada 5.3 5.7 7.0 8.0 8.3
China 4.1 4.0 4.3 4.0 5.7
Rest 16.3 20.2 19.7 19.0 20.8
• Pie chart (For displaying the “measurement” of each object as a proportion of the total.)
• Segmented bar graph (Same purpose as the pie chart.)
0%
20%
40%
60%
80%
100%
(c) Segmented bar
S. Africa
U.S.USSRAustr.Can.China
Rest
29%
11%11%
8%
6%
21% S. Africa
USSRAustr.
Can.
China
Rest
(b) Pie chart
14%U.S.0%
10%
20%
30%
(a) Bar graph
Pe
rce
nta
ge
S.
Af
U.S
.
US
SR
Au
str.
Can
.
Ch
ina
Res
t
Other Forms of Graphs
Choosing between Types of Graphs
• Bar graphs better at presenting relative sizes.
• Pie charts do not communicate information as well.
• Perspective pie charts are disastrous!• Avoid using perspective bar graphs.
A
D
E
F22%
13%
23%7%
25%
10%
B
C
22%
13%
23%7%
25%
10%
A
BCD
EF
13%
0%
5%
10%
15%
20%
25%
A B C D E FGroup
Per
cent
age
Some Principles of Graphical Excellence
• A well-designed presentation of interesting data. A matter of substance, of statistics, and of design.
• Communicates complex ideas with clarity, precision and efficiency.
• Gives the viewer the greatest number of ideas in the shortest possible time.
• Tells the truth about the data.The Visual Display of Quantitative Information
E. R. Tufte
Graphical Displays for Data on a Single Variable
Discrete or Ordinal
Quantitative/numeric
- continuousQualitative, Categorical or
Nominal
Histogram, box plot, stem-and-leaf plot
Frequency table, bar graph
Frequency table, bar graph, pie chart, or mosaic plot.