NielsHenrikBruun, SectionforGeneralPractice, Dept...

13
Literate programming Niels Henrik Bruun, Section for General Practice, Dept. Of Public Health, Aarhus University

Transcript of NielsHenrikBruun, SectionforGeneralPractice, Dept...

Page 1: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Literate programming

Niels Henrik Bruun,Section for General Practice,Dept. Of Public Health,

Aarhus University

Page 2: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Reprodiciple research

I DataI A prepared dataset

I a lot of programming behindI and data management

I AnalysisI all code used for the published articleI analysis and reportting becoming more complex

I AccessibilityI internet presentations and summaries?

A lot of programming:

I a researcher today must be or work together with a statistical programmer

Page 3: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Literate programming 1/2

I Learn from programmingI Knuth

I integrate text and codeI to work with the idea behindI different types of extracts ([commented] code, log, temp. report, article)I Only look at the final document - Error!

Page 4: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Literate programming 2/2I Work process

I work (interactively) in StataI writes mainly text (if commenting is done properly)I some analysis worth doing is not worth showing!!!I presentation

I Some text formats to integrate intoI Word/powerpoint (not that integrated to Stata)I html - Not readable in raw formI tex/latex (beamer) - For nerds, a bit more readable in raw formI markdown - Simple, readable, covers 90%

I Integration of statistical code and text is not newI SAS: SweaveI R: KnitRI Stata: Sweave, statweave, SAR, markdoc, texdoc, webdoc, log2markup, dyndoc,putdocx, putpdf

I also commands for making table in different text format

Page 5: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

log2markupCommand that select marked text, code and/or output blocks

I comments surrounded by /*** and ***/ are keptI comments surrounded by /* and */ are ignoredI comments starting with * are ignoredI comments starting with // are ignored

Prefix to commands:

I /**/ Show only commandI /***/ Show only output from commandI /****/ Show only output from command without formatting - SMART!!

I integrate table print from basetable and matprint into outputI No prefix: Show command and output (Just like Stata log)

Internal code blocks with comments:

I Command and comment blocks surrounded by //OFF and //ON are ignored

Page 6: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Make a word document out of a log2markup output using markdown

1. Generate a log file (/***/ ignores the command, but insert the output)

capture log closelog using toWord.log, text replace/***/do toWord.dolog close

2. Transform log file using log2markup

log2markup using toWord.log, replace extension(md)

3. Use pandoc to create a Word document for distribution

shell pandoc -s -S toWord.md -o toWord.docx

The do file toWord.do results in this Word document

Page 7: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Basetable 1 of 3I Easy build of a summary table (used in almost every article)I requires labels and value labels for used variablesI Format and report options

I Continous variables: format + sd, 95% ci, iqi, iqrI Categorical variables: row/col percentages, single value (ci)

I Total reportedI Comparison testI Titles for groups of variables

I Sub conditioningI Missing values with option missingI Export to Excel with option toxlI With option style it can be integrated in md/csv/latex/html documentsI Hide counts less than eg 5 in reports (DST)

Page 8: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Basetable 2 of 3basetable low ///

[* Quartile interval for age] age(%6.1f, iqi) ///[* Counts and % for categorical variables] race(c) ///[* CI for single categorical value] smoke(Yes, ci) ///, style(md) caption(A basetable demo) //missing

-----------------------------------------------------------------------------------------------------------------------------Normal Low Total P-value

------------------------------------------- ------------------- ------------------- ------------------- ---------n (%) 127 (68.3) 59 (31.7) 186 (100.0)

* Quartile interval for age

age of mother, median (iqi) 23.0 (19.0; 28.0) 22.0 (19.0; 25.0) 23.0 (19.0; 26.0) 0.24

* Counts and % for categorical variables

race, n (%)

white, n (%) 71 (57.3) 23 (39.0) 94 (51.4)

black, n (%) 14 (11.3) 11 (18.6) 25 (13.7)

other, n (%) 39 (31.5) 25 (42.4) 64 (35.0) 0.06

* CI for single categorical value

smoked during pregnancy (Yes), % (95% ci) 32.3 (24.2; 40.4) 50.8 (38.1; 63.6) 38.2 (31.2; 45.2) 0.02

-----------------------------------------------------------------------------------------------------------------------------Table: A basetable demo

Page 9: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Basetable 3 of 3

With prefix /****/ to command basetable it looks like:

Table 1: A basetable demo

Normal Low Total P-valueMissings / N

(Pct)

n (%) 127 (68.3) 59 (31.7) 186 (100.0) 3 / 189 (1.59)* Quartile interval for ageage of mother, median (iqi) 23.0 (19.0; 28.0) 22.0 (19.0; 25.0) 23.0 (19.0; 26.0) 0.24 4 / 189 (2.12)* Counts and % for categoricalvariablesrace, n (%)white, n (%) 71 (57.3) 23 (39.0) 94 (51.4)black, n (%) 14 (11.3) 11 (18.6) 25 (13.7)other, n (%) 39 (31.5) 25 (42.4) 64 (35.0) 0.06 6 / 189 (3.17)* CI for single categorical valuesmoked during pregnancy (Yes), %(95% ci)

32.3 (24.2; 40.4) 50.8 (38.1; 63.6) 38.2 (31.2; 45.2) 0 / 189 (0.00) 0 / 189 (0.00)

Page 10: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

sumat/matprint

I Extension of summarizeI Summarize summarised in matrixI Handles string variables when possibleI More functionality, eg ci, iqi, unique, missing etc

sysuse autosumat make price mpg rep78, statistics(n missing unique mean ci) style(md) rowby(foreign) decimals((0,0,0,2))

n missing unique mean ci95% lb ci95% ub

foreign(Domestic) Make and Model 52 0 52 . . .Price 52 0 52 6072.42 5230.64 6914.21Mileage (mpg) 52 0 17 19.83 18.54 21.12Repair Record 1978 48 4 5 3.02 2.78 3.26

foreign(Foreign) Make and Model 22 0 22 . . .Price 22 0 22 6384.68 5289.07 7480.29Mileage (mpg) 22 0 13 24.77 22.01 27.54Repair Record 1978 21 1 3 4.29 3.98 4.59

Page 11: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

crossmat - if it is worth showing, it is worth reusinguse hospid_demo, clearcrossmat race lowreturn list

matrices:r(lrchi2) : 4 x 3

r(chi2) : 4 x 3r(cpct) : 4 x 3r(rpct) : 4 x 3

r(greeks) : 3 x 2r(tests) : 2 x 3

r(expected) : 4 x 3r(pct) : 4 x 3

r(counts) : 4 x 3

matprint 100 * r(pct), d(0) s("md")

Normal Low Total

race white 39 13 51black 8 6 14other 21 14 35Total 68 32 100

Page 12: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Summary

When working with applied statistics

I It should be easy to integrate comments, code and outputs in scientific workI What is calculated in logs should be easy to use/reuseI “Keep everything in one document (do file)”I Output should be flexible / reusable

This is my attempt in this direction!

Page 13: NielsHenrikBruun, SectionforGeneralPractice, Dept ...pure.au.dk/portal/files/114965742/literate_programming.pdfBasetable 2 of 3 basetable low /// [* Quartile interval for age] age(%6.1f,

Questions ?Talk to my axe!!

Figure 1