Day1 Graphics 2x2

download Day1 Graphics 2x2

of 10

Transcript of Day1 Graphics 2x2

  • 8/10/2019 Day1 Graphics 2x2

    1/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphics in Stata

    Klaus K. Holst

    29 Sep 2014

    30

    40

    50

    60

    70

    80

    90

    1900 1920 1940 1960 1980 2000Year

    Life expectancy, males

    Life expectancy, females

    Data 19001999

    Life expectancy by gender

    19001999

    U.S. Life Expectancy

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphs in stataThe main command is graph followed by the type of graph

    graph twoway scatter plots, line plotsgraph matrix scatterplot matricesgraph bar bar charts

    graph dot dot chartsgraph box box-and-whisker plotsgraph pie pie chartsgraph save

    graph use

    graph combine

    plus more specialized graphs: histogram, kdensity, avplot, . . .

    1 graph twoway scatter ...

    2 // or

    3 twoway scatter ...

    4 // or

    5 scatter ...

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    U.S. life expectancy data, 1900-1999

    1 clear

    2 sysuse uslifeexp

    1 describe

    Contains data from /Applications/Stata/ado/base/u/uslifeexp.dtaobs: 100 U.S. life expectancy, 1900-1999

    vars: 10 30 Mar 2011 04:31size: 3,800 (_dta has notes)

    -------------------------------------------------------------------------------storage display value

    v ar ia bl e n am e t yp e f or ma t l ab el v ar ia bl e l ab el-------------------------------------------------------------------------------year int %9.0g Yearle float %9.0g life expectancyle_male float %9.0g Life expectancy, malesle_female float %9.0g Life expectancy, femalesle_w float %9.0g Life expectancy, whitesle_wmale float %9.0g Life expectancy, white malesle _w fem al e fl oa t %9. 0g Li fe e xpe ct an cy , w hi te fe ma le sle_b float %9.0g Life expectancy, blacksle_bmale float %9.0g Life expectancy, black malesle _b fem al e fl oa t %9. 0g Li fe e xpe ct an cy , b la ck fe ma le s-------------------------------------------------------------------------------Sorted by: year

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plots

    1 twoway scatter le year

    40

    50

    60

    70

    80

    lifee

    xpectancy

    1900 1920 1940 1960 1980 2000

    Year

  • 8/10/2019 Day1 Graphics 2x2

    2/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plots

    1 twoway spike le year

    40

    50

    60

    70

    80

    lifee

    xpectancy

    1900 1920 1940 1960 1980 2000

    Year

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Line plots

    1 twoway line le year

    40

    50

    60

    70

    80

    lifee

    xpectancy

    1900 1920 1940 1960 1980 2000

    Year

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Line plots

    1 twoway line le year, connect(stairstep)

    40

    50

    60

    70

    80

    lifee

    xpectancy

    1900 1920 1940 1960 1980 2000

    Year

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    SchemesA scheme specifies the overall look of the graphTo change for the session

    1 set scheme s2mono

    1 graph query, schemes

    Available schemes are

    s2color see help scheme_s2color

    s 2m on o se e he lp s ch em e_ s2 mo no

    s2manual see help scheme_s2manual

    s2gmanual see help scheme_s2gmanual

    s2gcolor see help scheme_s2gcolor

    s1color see help scheme_s1color

    s 1m on o se e he lp s ch em e_ s1 mo no

    s1rcolor see help scheme_s1rcolor

    s1manual see help scheme_s1manual

    sj see help scheme_sj

    economist see help scheme_economist

    s2color8 see help scheme_s2color8

    lean1 see help scheme_lean1

    lean2 see help scheme_lean2

    rbn1mono see hel scheme rbn1mono

  • 8/10/2019 Day1 Graphics 2x2

    3/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Schemes

    1 twoway line le year, scheme(s1color)

    40

    50

    60

    70

    80

    lifee

    xpectancy

    1900 1920 1940 1960 1980 2000

    Year

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Multiple graphs in oneOverlay multiple twoway graphs (parentheses or seperate by ||)

    1 twoway (line le_male year) (line le_female year)

    40

    50

    60

    70

    80

    1900 1920 1940 1960 1980 2000Year

    Life expectancy , males Lif e expectancy, females

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphics options

    Stata can produce nice publishable graphics results in few steps.

    Axis limits chosen automatically

    Legends automatically added

    Axis labels automatically obtained from variable labels.

    To alter we need to add options (everything after ,)General syntax of the scatter graphics command:

    twoway scatter varlist[if] [in] [, options]

    Options divided into subgraph (here scatter), e.g.

    marker_options change look of markers (colour, size, etc.)connect_options change look of lines or connecting method

    axis_choice_options associate plot with alternate axis

    and options for the global graph commandtwoway_options by, name, titles, legends, axes, etc.

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphics optionsGeneral syntax:

    twoway (line ..., line_options) ///

    (scatter ..., scatter_options) ///

    (lfit ..., lfit_options), twoway_options

    In the previous plot the y -axis label disappeared:

    1 twoway (line le_male year) (line le_female year),

    ytitle(Life expectancy in years)

    Some twoway options:

    by(varlist, ...) repeat for subgroupsnodraw suppress display of graphname(name, ...)

    scheme(schemename) overall lookxtitle,ytitle Axis titles

    xlabel,ylabel Axis labels positionslegend Legend options

    title subtitle Graph title

  • 8/10/2019 Day1 Graphics 2x2

    4/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphic optionsscattermarker options:

    msymbol(symbolstylelist) shape of markermcolor(colorstylelist) colour of marker, inside and outmsize(markersizestylelist) size of markermfcolor(colorstylelist) inside or "fill" colour

    mlcolor(colorstylelist) colour of outlinemlwidth(linewidthstylelist) thickness of outline

    . . . , jitter options, connect options, label options, . . .

    lineoptions

    connect(connectstyle) how to connect pointssort[(varlist)] how to sort before connectingcmissing(y/n) missing values are ignored

    lpattern(linepatternstyle) line pattern (solid, dashed, etc.)lwidth(linewidthstyle) thickness of linelcolor(colorstyle) colour of linelstyle(linestyle) overall style of line

    . . .

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphics options

    1 twoway (line le_male year) (line le_female year),

    ytitle(Life expectancy in years)

    40

    50

    60

    70

    80

    Lifee

    xpectancy

    iny

    e

    ars

    1900 1920 1940 1960 1980 2000Year

    Life expec tancy, males Life expectancy , females

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphics options

    If the axis label is going to be reused many times we may store it ina macro

    1 local gopt ytitle(Life expectancy in years) xlabel

    (1875(25)2025) title(U.S. Life Expectancy)

    where we also add a little more space to the x-axis + title

    1 twoway (line le_male year) (line le_female year),gopt

    . . . and some different line types and colours

    1 local maleline "line le_male year, color(dknavy)

    lpattern(solid) connect(stairstep)"

    2 local femaleline "line le_female year, color(dkorange)

    lpattern(dash_dot) connect(stairstep)"

    1 twoway (maleline) (femaleline), gopt

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Digression: MacrosAn alias that can be dereferenced in the program everywhere(!)

    1 local a 1

    2 local b a b c

    3 global b "Hello"

    local macros lives within this scope where they were defined (i.e.the do-file or program/function).

    1 di a

    2 di "$b b"

    1

    Hello a b c

    To evaluate a macro expression use =

    1 local a = a+1

    2 di a

    2

  • 8/10/2019 Day1 Graphics 2x2

    5/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Digression: MacrosMeta-programming with macros

    1 capture drop x1 x2

    2 input x1 x2

    3 1 3

    4 2 4

    5 end

    1 local idx 1 2

    2 foreach i in idx {

    3 list xi in 1/2

    4 }

    +----+| x 1 ||----|

    1. | 1 |2. | 2 |

    +----+

    +----+| x 2 ||----|

    1. | 3 |2. | 4 |

    +----+

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Graphics options

    1 twoway (maleline) (femaleline), gopt

    40

    50

    60

    70

    80

    Life

    expectancy

    in

    years

    1875 1900 1925 1950 1975 2000 2025Year

    L ife expec tancy, males L ife expectancy , females

    U.S. Life Expectancy

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Line types

    1 palette linepalette

    solid

    dash

    longdash_dot

    dot

    longdash

    dash_dot

    shortdash

    shortdash_dot

    blank

    Line pattern palette

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Symbols

    1 palette symbolpalette

    O Oh o oh

    D Dh d dh

    T Th t th

    S Sh s sh

    + smplus

    X x

    p

    Symbol palette

  • 8/10/2019 Day1 Graphics 2x2

    6/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Colours

    Customize with single-line color-colname.style (in ado-path):1

    set rgb "255 100 50"

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Titles, plot region

    l2 l1 r1 r2

    1 2 0 0 0 0 0 0 0 0 0 0 0 1 2 ringposstyle

    l2title

    l1title

    titlesubtitle

    t2title

    t1title

    plot region

    b1title

    b2title

    legend

    notecaption

    r1title

    r2title

    7 title6 s ub ti tl e

    2 t2

    1 t1

    00

    000

    0

    0

    1 b1

    2 b2

    3 legend

    4 note5 c apt ion

    where titles are located is controlled by the scheme

    1 help title_options

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Titles

    1 #delimit ;

    2 twoway (maleline) (femaleline),

    3 title("U.S. Life Expectancy")

    4 subtitle("1900-1999")

    5 caption("Life expectancy by gender")

    6 note("Data 1900-1999")7 legend(col(1) ring(0) position(11))

    8 yscale(range(30 90))

    9 ylabel(30(10)90)

    10 name(lifeexptitles, replace);

    11 #delimit cr

    Position given as clock-position

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Titles

    30

    40

    50

    60

    70

    80

    90

    1900 1920 1940 1960 1980 2000Year

    Life expectancy, males

    Life expectancy, females

    Data 19001999

    Life expectancy by gender

    19001999

    U.S. Life Expectancy

  • 8/10/2019 Day1 Graphics 2x2

    7/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Example 2, World Life Expectancy 1998

    1 sysuse lifeexp, clear

    1 describe

    Contains data from /Applications/Stata/ado/base/l/lifeexp.dta

    obs: 68 Life expectancy, 1998

    vars: 6 26 Mar 2011 09:40

    size: 2,652 (_dta has notes)-------------------------------------------------------------------------------

    storage display value

    va ri ab le n am e ty pe fo rm at l ab el v ari ab le l ab el

    -------------------------------------------------------------------------------

    region byte %12.0g region Region

    country str28 %28s Country

    popgrowth float %9.0g * Avg. annual % growth

    lexp byte %9.0g * Life expectancy at birth

    gnppc float %9.0g * GNP per capita

    safewater byte %9.0g *

    * indicated variables have notes

    -------------------------------------------------------------------------------

    Sorted by:

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plot matrices

    1 graph matrix gnppc popgrowth lexp safewater,

    2 half note("")

    GNPper

    capita

    Avg.annual

    %growth

    Lifeexpectancy

    at birth

    safewater

    0 20000 40000

    0

    2

    4

    0 2 4

    50

    60

    70

    80

    50 60 70 80

    0

    50

    100

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plotsPlotting the association between GNP (on log-scale) and lifeexpectancy in North America,South America with different colours

    1 twoway (scatter lexp gnppc if region==2, mcolor(

    dkorange) msize(0.8)) (scatter lexp gnppc if region

    ==3, mcolor(dknavy) msize(0.8)),

    2 xscale(log) xlabel(1000 2000 4000 8000 16000,angle

    (90)) legend(order(1 "North America" 2 "South

    America"))

    Subsetting

    Notice we have here used the if argument together with logicalexpressions to subset the scatter plots. This applies to every graphcommand!

    Stratification

    We can also use the by option to make different plots for differentlevels of a third variable.

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plotsFrom the overlay plot using if statements:

    55

    60

    65

    70

    75

    80

    Life

    expect

    ancy

    atbirth

    1000

    2000

    4000

    8000

    16000

    GNP per capita

    North America South America

    DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS

  • 8/10/2019 Day1 Graphics 2x2

    8/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plots, by

    1 twoway (scatter lexp gnppc),

    2 by(region, row(1)) xsize(10)

    50

    60

    70

    80

    0 10000 20000 30000 40000 0 10000 20000 30000 40000 0 10000 20000 30000 40000

    Eur & C.Asia N.A. S.A.

    Lifee

    xpectancy

    atbirth

    GNP per capitaGraphs by Region

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plotsLabels can be added to the graph (here we also remove the points)

    1 scatter lexp gnppc if region==2,

    2 mlabsize(2) mlabel(country) mlabposition(0) msymbol(i))

    Canada

    Dominican Republic

    El Salvador

    Guatemala

    Haiti

    Honduras

    Jamaica

    Mexico

    Nicaragua

    PanamaTrinidad and Tobago

    United States

    55

    60

    65

    70

    75

    80

    Life

    expectancy

    atbirth

    0 10000 20000 30000 i

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Scatter plots, point sizesAnd size of points depending on another variable

    1 scatter lexp gnppc if region==2 [pweight=popgrowth],

    msymbol(Oh)

    55

    60

    65

    70

    75

    80

    Lifee

    xpectancy

    atbirth

    0 10000 20000 30000

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Curve fitsLinear regression (lfit, lfitci) or quadratic (qfit , qfitci)

    1 twoway (lfitci lexp safewater) (scatter lexp safewater)

    55

    60

    65

    70

    75

    80

    20 40 60 80 100safewater

    95% CI Fitted values

    Life expectancy at birth

    DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS

  • 8/10/2019 Day1 Graphics 2x2

    9/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Bar plots

    1 graph bar (mean) lexp (p50) lexp (mean) safewater (p50)

    safewater, over(region)

    0

    20

    40

    60

    80

    Eur & C.Asia N.A. S.A.

    mean of lexp p 50 of lexp

    mean of safewater p 50 of safewater

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Box plots

    Box-whisker plots

    Gives a quick summary of the marginal distribution of continuousvariables. Useful for getting a quick overview of skewness, potentialoutliers etc. for many variables.

    1 graph box lexp, over(region) marker(1,mlabel(country))

    Box limits are the 25% and 75% quantiles and with medianmarked. as a line in the box. The whiskers shows the most extremeobservations (min/max) within 1.5IQR from the the box limits (or

    else this limit).

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Box plots

    1 graph box lexp, over(region) marker(1,mlabel(country))

    Haiti

    Bolivia

    55

    60

    65

    70

    75

    80

    Lifee

    xpectancy

    atbirth

    Eur & C.Asia N.A. S.A.

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Histograms, density estimation

    Histograms can be generated with the syntax

    1 histogram lexp, bins(#) width(#)

    Selecting the width or number of bins potentially difficult, bydefault selected ad hoc from the number of observations n:

    k= min{sqrt(n), 10log10(n)}

    We can overlay normal approximation (option normal) ornon-parametric kernel density estimates (option kdensity), there is

    also a seperate graph kdensitycommand)

    1 histogram lexp, normal normopt(lpattern(dot))

    DEPARTMENT OF BIOSTATISTICS DEPARTMENT OF BIOSTATISTICS

  • 8/10/2019 Day1 Graphics 2x2

    10/10

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    Histograms

    1 histogram lexp, normal normopt(lpattern(dot))

    0

    .02

    .04

    .06

    .08

    .1

    Density

    55 60 65 70 75 80Life expectancy at birth

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    QQ-plotsComparison with the theoretical quantiles of a normal distribution

    1 qnorm lexp

    50

    60

    70

    80

    Life

    expectancy

    atbirth

    60 65 70 75 80 85

    Inverse Normal

    UNIVERSITY OF COPENHAGEN

    DEPARTMENT OF BIOSTATISTICS

    QQ-plots

    1 capture drop z

    2 gen z = rnormal()

    3 qnorm z