Day1 Graphics 2x2
Transcript of Day1 Graphics 2x2
-
8/10/2019 Day1 Graphics 2x2
1/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphics in Stata
Klaus K. Holst
29 Sep 2014
30
40
50
60
70
80
90
1900 1920 1940 1960 1980 2000Year
Life expectancy, males
Life expectancy, females
Data 19001999
Life expectancy by gender
19001999
U.S. Life Expectancy
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphs in stataThe main command is graph followed by the type of graph
graph twoway scatter plots, line plotsgraph matrix scatterplot matricesgraph bar bar charts
graph dot dot chartsgraph box box-and-whisker plotsgraph pie pie chartsgraph save
graph use
graph combine
plus more specialized graphs: histogram, kdensity, avplot, . . .
1 graph twoway scatter ...
2 // or
3 twoway scatter ...
4 // or
5 scatter ...
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
U.S. life expectancy data, 1900-1999
1 clear
2 sysuse uslifeexp
1 describe
Contains data from /Applications/Stata/ado/base/u/uslifeexp.dtaobs: 100 U.S. life expectancy, 1900-1999
vars: 10 30 Mar 2011 04:31size: 3,800 (_dta has notes)
-------------------------------------------------------------------------------storage display value
v ar ia bl e n am e t yp e f or ma t l ab el v ar ia bl e l ab el-------------------------------------------------------------------------------year int %9.0g Yearle float %9.0g life expectancyle_male float %9.0g Life expectancy, malesle_female float %9.0g Life expectancy, femalesle_w float %9.0g Life expectancy, whitesle_wmale float %9.0g Life expectancy, white malesle _w fem al e fl oa t %9. 0g Li fe e xpe ct an cy , w hi te fe ma le sle_b float %9.0g Life expectancy, blacksle_bmale float %9.0g Life expectancy, black malesle _b fem al e fl oa t %9. 0g Li fe e xpe ct an cy , b la ck fe ma le s-------------------------------------------------------------------------------Sorted by: year
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plots
1 twoway scatter le year
40
50
60
70
80
lifee
xpectancy
1900 1920 1940 1960 1980 2000
Year
-
8/10/2019 Day1 Graphics 2x2
2/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plots
1 twoway spike le year
40
50
60
70
80
lifee
xpectancy
1900 1920 1940 1960 1980 2000
Year
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Line plots
1 twoway line le year
40
50
60
70
80
lifee
xpectancy
1900 1920 1940 1960 1980 2000
Year
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Line plots
1 twoway line le year, connect(stairstep)
40
50
60
70
80
lifee
xpectancy
1900 1920 1940 1960 1980 2000
Year
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
SchemesA scheme specifies the overall look of the graphTo change for the session
1 set scheme s2mono
1 graph query, schemes
Available schemes are
s2color see help scheme_s2color
s 2m on o se e he lp s ch em e_ s2 mo no
s2manual see help scheme_s2manual
s2gmanual see help scheme_s2gmanual
s2gcolor see help scheme_s2gcolor
s1color see help scheme_s1color
s 1m on o se e he lp s ch em e_ s1 mo no
s1rcolor see help scheme_s1rcolor
s1manual see help scheme_s1manual
sj see help scheme_sj
economist see help scheme_economist
s2color8 see help scheme_s2color8
lean1 see help scheme_lean1
lean2 see help scheme_lean2
rbn1mono see hel scheme rbn1mono
-
8/10/2019 Day1 Graphics 2x2
3/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Schemes
1 twoway line le year, scheme(s1color)
40
50
60
70
80
lifee
xpectancy
1900 1920 1940 1960 1980 2000
Year
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Multiple graphs in oneOverlay multiple twoway graphs (parentheses or seperate by ||)
1 twoway (line le_male year) (line le_female year)
40
50
60
70
80
1900 1920 1940 1960 1980 2000Year
Life expectancy , males Lif e expectancy, females
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphics options
Stata can produce nice publishable graphics results in few steps.
Axis limits chosen automatically
Legends automatically added
Axis labels automatically obtained from variable labels.
To alter we need to add options (everything after ,)General syntax of the scatter graphics command:
twoway scatter varlist[if] [in] [, options]
Options divided into subgraph (here scatter), e.g.
marker_options change look of markers (colour, size, etc.)connect_options change look of lines or connecting method
axis_choice_options associate plot with alternate axis
and options for the global graph commandtwoway_options by, name, titles, legends, axes, etc.
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphics optionsGeneral syntax:
twoway (line ..., line_options) ///
(scatter ..., scatter_options) ///
(lfit ..., lfit_options), twoway_options
In the previous plot the y -axis label disappeared:
1 twoway (line le_male year) (line le_female year),
ytitle(Life expectancy in years)
Some twoway options:
by(varlist, ...) repeat for subgroupsnodraw suppress display of graphname(name, ...)
scheme(schemename) overall lookxtitle,ytitle Axis titles
xlabel,ylabel Axis labels positionslegend Legend options
title subtitle Graph title
-
8/10/2019 Day1 Graphics 2x2
4/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphic optionsscattermarker options:
msymbol(symbolstylelist) shape of markermcolor(colorstylelist) colour of marker, inside and outmsize(markersizestylelist) size of markermfcolor(colorstylelist) inside or "fill" colour
mlcolor(colorstylelist) colour of outlinemlwidth(linewidthstylelist) thickness of outline
. . . , jitter options, connect options, label options, . . .
lineoptions
connect(connectstyle) how to connect pointssort[(varlist)] how to sort before connectingcmissing(y/n) missing values are ignored
lpattern(linepatternstyle) line pattern (solid, dashed, etc.)lwidth(linewidthstyle) thickness of linelcolor(colorstyle) colour of linelstyle(linestyle) overall style of line
. . .
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphics options
1 twoway (line le_male year) (line le_female year),
ytitle(Life expectancy in years)
40
50
60
70
80
Lifee
xpectancy
iny
e
ars
1900 1920 1940 1960 1980 2000Year
Life expec tancy, males Life expectancy , females
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphics options
If the axis label is going to be reused many times we may store it ina macro
1 local gopt ytitle(Life expectancy in years) xlabel
(1875(25)2025) title(U.S. Life Expectancy)
where we also add a little more space to the x-axis + title
1 twoway (line le_male year) (line le_female year),gopt
. . . and some different line types and colours
1 local maleline "line le_male year, color(dknavy)
lpattern(solid) connect(stairstep)"
2 local femaleline "line le_female year, color(dkorange)
lpattern(dash_dot) connect(stairstep)"
1 twoway (maleline) (femaleline), gopt
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Digression: MacrosAn alias that can be dereferenced in the program everywhere(!)
1 local a 1
2 local b a b c
3 global b "Hello"
local macros lives within this scope where they were defined (i.e.the do-file or program/function).
1 di a
2 di "$b b"
1
Hello a b c
To evaluate a macro expression use =
1 local a = a+1
2 di a
2
-
8/10/2019 Day1 Graphics 2x2
5/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Digression: MacrosMeta-programming with macros
1 capture drop x1 x2
2 input x1 x2
3 1 3
4 2 4
5 end
1 local idx 1 2
2 foreach i in idx {
3 list xi in 1/2
4 }
+----+| x 1 ||----|
1. | 1 |2. | 2 |
+----+
+----+| x 2 ||----|
1. | 3 |2. | 4 |
+----+
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Graphics options
1 twoway (maleline) (femaleline), gopt
40
50
60
70
80
Life
expectancy
in
years
1875 1900 1925 1950 1975 2000 2025Year
L ife expec tancy, males L ife expectancy , females
U.S. Life Expectancy
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Line types
1 palette linepalette
solid
dash
longdash_dot
dot
longdash
dash_dot
shortdash
shortdash_dot
blank
Line pattern palette
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Symbols
1 palette symbolpalette
O Oh o oh
D Dh d dh
T Th t th
S Sh s sh
+ smplus
X x
p
Symbol palette
-
8/10/2019 Day1 Graphics 2x2
6/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Colours
Customize with single-line color-colname.style (in ado-path):1
set rgb "255 100 50"
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Titles, plot region
l2 l1 r1 r2
1 2 0 0 0 0 0 0 0 0 0 0 0 1 2 ringposstyle
l2title
l1title
titlesubtitle
t2title
t1title
plot region
b1title
b2title
legend
notecaption
r1title
r2title
7 title6 s ub ti tl e
2 t2
1 t1
00
000
0
0
1 b1
2 b2
3 legend
4 note5 c apt ion
where titles are located is controlled by the scheme
1 help title_options
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Titles
1 #delimit ;
2 twoway (maleline) (femaleline),
3 title("U.S. Life Expectancy")
4 subtitle("1900-1999")
5 caption("Life expectancy by gender")
6 note("Data 1900-1999")7 legend(col(1) ring(0) position(11))
8 yscale(range(30 90))
9 ylabel(30(10)90)
10 name(lifeexptitles, replace);
11 #delimit cr
Position given as clock-position
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Titles
30
40
50
60
70
80
90
1900 1920 1940 1960 1980 2000Year
Life expectancy, males
Life expectancy, females
Data 19001999
Life expectancy by gender
19001999
U.S. Life Expectancy
-
8/10/2019 Day1 Graphics 2x2
7/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Example 2, World Life Expectancy 1998
1 sysuse lifeexp, clear
1 describe
Contains data from /Applications/Stata/ado/base/l/lifeexp.dta
obs: 68 Life expectancy, 1998
vars: 6 26 Mar 2011 09:40
size: 2,652 (_dta has notes)-------------------------------------------------------------------------------
storage display value
va ri ab le n am e ty pe fo rm at l ab el v ari ab le l ab el
-------------------------------------------------------------------------------
region byte %12.0g region Region
country str28 %28s Country
popgrowth float %9.0g * Avg. annual % growth
lexp byte %9.0g * Life expectancy at birth
gnppc float %9.0g * GNP per capita
safewater byte %9.0g *
* indicated variables have notes
-------------------------------------------------------------------------------
Sorted by:
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plot matrices
1 graph matrix gnppc popgrowth lexp safewater,
2 half note("")
GNPper
capita
Avg.annual
%growth
Lifeexpectancy
at birth
safewater
0 20000 40000
0
2
4
0 2 4
50
60
70
80
50 60 70 80
0
50
100
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plotsPlotting the association between GNP (on log-scale) and lifeexpectancy in North America,South America with different colours
1 twoway (scatter lexp gnppc if region==2, mcolor(
dkorange) msize(0.8)) (scatter lexp gnppc if region
==3, mcolor(dknavy) msize(0.8)),
2 xscale(log) xlabel(1000 2000 4000 8000 16000,angle
(90)) legend(order(1 "North America" 2 "South
America"))
Subsetting
Notice we have here used the if argument together with logicalexpressions to subset the scatter plots. This applies to every graphcommand!
Stratification
We can also use the by option to make different plots for differentlevels of a third variable.
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plotsFrom the overlay plot using if statements:
55
60
65
70
75
80
Life
expect
ancy
atbirth
1000
2000
4000
8000
16000
GNP per capita
North America South America
DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS
-
8/10/2019 Day1 Graphics 2x2
8/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plots, by
1 twoway (scatter lexp gnppc),
2 by(region, row(1)) xsize(10)
50
60
70
80
0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000
Eur & C.Asia N.A. S.A.
Lifee
xpectancy
atbirth
GNP per capitaGraphs by Region
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plotsLabels can be added to the graph (here we also remove the points)
1 scatter lexp gnppc if region==2,
2 mlabsize(2) mlabel(country) mlabposition(0) msymbol(i))
Canada
Dominican Republic
El Salvador
Guatemala
Haiti
Honduras
Jamaica
Mexico
Nicaragua
PanamaTrinidad and Tobago
United States
55
60
65
70
75
80
Life
expectancy
atbirth
0 10000 20000 30000 i
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Scatter plots, point sizesAnd size of points depending on another variable
1 scatter lexp gnppc if region==2 [pweight=popgrowth],
msymbol(Oh)
55
60
65
70
75
80
Lifee
xpectancy
atbirth
0 10000 20000 30000
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Curve fitsLinear regression (lfit, lfitci) or quadratic (qfit , qfitci)
1 twoway (lfitci lexp safewater) (scatter lexp safewater)
55
60
65
70
75
80
20 40 60 80 100safewater
95% CI Fitted values
Life expectancy at birth
DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS
-
8/10/2019 Day1 Graphics 2x2
9/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Bar plots
1 graph bar (mean) lexp (p50) lexp (mean) safewater (p50)
safewater, over(region)
0
20
40
60
80
Eur & C.Asia N.A. S.A.
mean of lexp p 50 of lexp
mean of safewater p 50 of safewater
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Box plots
Box-whisker plots
Gives a quick summary of the marginal distribution of continuousvariables. Useful for getting a quick overview of skewness, potentialoutliers etc. for many variables.
1 graph box lexp, over(region) marker(1,mlabel(country))
Box limits are the 25% and 75% quantiles and with medianmarked. as a line in the box. The whiskers shows the most extremeobservations (min/max) within 1.5IQR from the the box limits (or
else this limit).
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Box plots
1 graph box lexp, over(region) marker(1,mlabel(country))
Haiti
Bolivia
55
60
65
70
75
80
Lifee
xpectancy
atbirth
Eur & C.Asia N.A. S.A.
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Histograms, density estimation
Histograms can be generated with the syntax
1 histogram lexp, bins(#) width(#)
Selecting the width or number of bins potentially difficult, bydefault selected ad hoc from the number of observations n:
k= min{sqrt(n), 10log10(n)}
We can overlay normal approximation (option normal) ornon-parametric kernel density estimates (option kdensity), there is
also a seperate graph kdensitycommand)
1 histogram lexp, normal normopt(lpattern(dot))
DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS
-
8/10/2019 Day1 Graphics 2x2
10/10
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
Histograms
1 histogram lexp, normal normopt(lpattern(dot))
0
.02
.04
.06
.08
.1
Density
55 60 65 70 75 80Life expectancy at birth
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
QQ-plotsComparison with the theoretical quantiles of a normal distribution
1 qnorm lexp
50
60
70
80
Life
expectancy
atbirth
60 65 70 75 80 85
Inverse Normal
UNIVERSITY OF COPENHAGEN
DEPARTMENT OF BIOSTATISTICS
QQ-plots
1 capture drop z
2 gen z = rnormal()
3 qnorm z