Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013)...

75
Intro Basics VCS HPC resources Workshop: High-performance computing for economists Lars Vilhuber 1 John M. Abowd 1 Richard Mansfield 1 Hautahi Kingi 1 Flavio Stanchi 1 Sida Peng 1 Kevin L. McKinney 1 Cornell University, Economics Department, August 17-19, 2015 Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Transcript of Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013)...

Page 1: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Workshop: High-performance computing foreconomists

Lars Vilhuber1 John M. Abowd1 Richard Mansfield1

Hautahi Kingi1 Flavio Stanchi1 Sida Peng1

Kevin L. McKinney

1Cornell University, Economics Department,

August 17-19, 2015

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 2: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Workshop: High-performance computing for economists

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 3: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

HPC

Back in the days...

RAM: 2,000 words (2kB); Speed: 2 MHzSource: Wikipedia

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 4: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

HPC

Back in the days...

RAM: 2,000 words (2kB); Speed: 2 MHzSource: Wikipedia

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 5: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

They went to the moon

Source: Flickr

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 6: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Big progress

RAM: 2 ×32 kB; Speed: 1 MHz, $1,500 (today’s USD)Wikipedia

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 7: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Today

RAM: 2 ×10242 kB; Speed: 1.700 MHz × 4$700 (today’s USD) Source: Wikipedia

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 8: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

We still fly to the moon

Source CNET

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 9: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

This is where you can go

Stampede (no. 6 on Top500 as of June 2013)

RAM: 192 ×10243 kB, Speed: 2,700 Mhz × 462,462Source: TACC

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 10: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

This is where you can go

Stampede (no. 6 on Top500 as of June 2013)

RAM: 192 ×10243 kB, Speed: 2,700 Mhz × 462,462Source: TACC

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 11: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

But first...

http://viewfromwitsend.wordpress.com/

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 12: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

But first...

http://viewfromwitsend.wordpress.com/

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 13: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

What do you learn in a Ph.D. program?

How to learn...

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 14: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

What do you learn in a Ph.D. program?How to learn...

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 15: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Goal of this class

To open new doors, to be able to conceive of problems that youdidn’t think had a feasible solution.To broaden your knowledge about what you do NOT know

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 16: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Goal of this classTo open new doors, to be able to conceive of problems that youdidn’t think had a feasible solution.

To broaden your knowledge about what you do NOT know

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 17: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Goal of this classTo open new doors, to be able to conceive of problems that youdidn’t think had a feasible solution.To broaden your knowledge about what you do NOT know

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 18: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

So in order to do that...

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 19: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

So in order to do that...

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 20: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

So in order to do that...

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 21: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)

I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)

I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 22: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editor

I How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)

I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 23: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.

I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)

I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 24: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programs

I NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)

I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 25: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on cluster

I Basic scriptingI Basics of version control (Lars)

I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 26: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)

I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 27: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)

I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 28: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)I File-system based version control

I More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 29: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)I File-system based version controlI More formal version control (Subversion, Git)

I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 30: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)I File-system based version controlI More formal version control (Subversion, Git)I Working with servers

I Setting up infrastructure at CornellI HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 31: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 32: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 1

I Programming basics (Lars)I Choosing an editorI How to structure programs, texts, etc.I A clean sequence of programsI NX, SSH, Linux, request an account on clusterI Basic scripting

I Basics of version control (Lars)I File-system based version controlI More formal version control (Subversion, Git)I Working with serversI Setting up infrastructure at Cornell

I HP resources at Cornell, elsewhere

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 33: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 34: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Overview

Day 3

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 35: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structure of the class

Teaching...We’ll take you on a 4,000 m flight through topics...

... and practice

... and then swoop in on some examples, leaving ample time topractice it.

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 36: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structure of the class

Teaching...We’ll take you on a 4,000 m flight through topics...

... and practice

... and then swoop in on some examples, leaving ample time topractice it.

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 37: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Choosing editors

Why does choosing editors matter?The (applied) research process iterates through writing papersand doing estimation. You want to use the appropriate tools foreach task.

Integrated or separate

I You can use native tools that come with each wordprocessing facility/programming language/etc.

I Not all of them will have one.I Not all of them will work on all platforms.I You will likely use multiple tools

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 38: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Choosing an editor

... or systemSeparate editors and systems

I MS Word and math editor(Windows/OSX butcompatibility issues)

I LibreOffice(Windows/OSX/Linux butnot as good)

I NotePad++ (Windows)I Gedit, (X)Emacs, Kate

(Linux)I Sublime Text (OSX)I Atom (all, see also MS

Visual Studio Code)

LATEX: all platforms, but someGUIs are not cross-platform,ease of use varies:

I TeXstudio (all platforms)I TeXMaker (all platforms)I Scientific Workplace

(Windows, mythicalLinux)

I TeXWorks+MiktexI TEXnicCenterI and (many more)

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 39: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Choosing an editor

... or systemIntegrating programming andrunning

I IDE ( Eclipse, ActiveStateKomodo, etc.)

I Native programming GUIs(SAS, Matlab, Stata)

I Gedit, (X)Emacs (withadd-on functionality)

Integrating programs andtext/results

I SWeave/knitr (integratesLATEX and R)

I RStudio (GUI to R andSWeave/knitr)

I Shiny (web interface to Rwith dynamic results)

I StatRep (Integrated SASand LATEX, Source 1,Source 2)

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 40: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structuring programs

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 41: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structuring programs

Easy...

Listing 1: mystuff.sas1 data ”C:\Users\Me\CensusChina . sas7bdat ” ;2 set ”C:\Users\Me\CensusChina . sas7bdat ” ;3 earn= log ( earn ) ;4 run ;5 proc reg data =”C:\Users\Me\CensusChina . sas7bdat ” ;6 model earn = sex educat ion exper ience ;7 run ;

What can possibly be wrong about that?

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 42: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structuring programs 2

Easier...

Listing 2: mystuff.do1 use ”C:\Users\Me\CensusChina . dta ”2 rep lace earn= log ( earn )3 regress earn sex educat ion exper ience4 save , rep lace

What can possibly be wrong about that?

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 43: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structuring programs 3

Actually...Everything!

I Name of program: uninformativeI Destruction of original data: program cannot be re-run for

same resultsI No portability: cannot be run anywhere elseI No explanation: why are we doing this?

But of course, nobody does that, right?

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 44: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structuring programs 4

Better...?

Listing 3: china-regression.sas1 data logCensusChina ;2 set ”C:\Users\Me\CensusChina . sas7bdat ” ;3 earn= log ( earn ) ;4 run ;5 proc reg data=logCensusChina ;6 model earn = sex educat ion exper ience ;7 run ;

Somewhat...

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 45: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structuring programs 4

Better...?

Listing 4: china-regression.sas1 data logCensusChina ;2 set ”C:\Users\Me\CensusChina . sas7bdat ” ;3 earn= log ( earn ) ;4 run ;5 proc reg data=logCensusChina ;6 model earn = sex educat ion exper ience ;7 run ;

Somewhat...

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 46: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Structuring programs 5

Addressing these issues

I Naming of programs: hereI Commenting: hereI Versioning: up nextI Portability and Data management: tomorrow

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 47: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Key notions about naming

Think of yourself as highly amnesiac...

I The research paper you are writing now will be submitted,rejected, worked on, questioned...

I ... by others and yourselfI ... in intervals of weeks, months, years...I Your future research assistant and the future YOU will

need to understand how to go through it.

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 48: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Key notions about naming

Think of yourself as highly amnesiac...

I The research paper you are writing now will be submitted,rejected, worked on, questioned...

I ... by others and yourself

I ... in intervals of weeks, months, years...I Your future research assistant and the future YOU will

need to understand how to go through it.

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 49: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Key notions about naming

Think of yourself as highly amnesiac...

I The research paper you are writing now will be submitted,rejected, worked on, questioned...

I ... by others and yourselfI ... in intervals of weeks, months, years...

I Your future research assistant and the future YOU willneed to understand how to go through it.

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 50: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Key notions about naming

Think of yourself as highly amnesiac...

I The research paper you are writing now will be submitted,rejected, worked on, questioned...

I ... by others and yourselfI ... in intervals of weeks, months, years...I Your future research assistant and the future YOU will

need to understand how to go through it.

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 51: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Naming

The really bad

mystu f f .Rread .Rvers ion2 .Ro ls . sas

The bad

readCensus .RreadBLS .RprepareCensus .RrunOLS . sas

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 52: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Naming

The really bad

mystu f f .Rread .Rvers ion2 .Ro ls . sas

The bad

readCensus .RreadBLS .RprepareCensus .RrunOLS . sas

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 53: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Naming

Better

01 readBLS .R02 readCensus .R03 prepareCensus .R04 c r e a t e a n a l y s i s d a t a .R05 runOLS . sas

Even better

01 01 readBLS .R02 01 readCensus .R02 02 prepareCensus .R03 0 1 c r e a t e a n a l y s i s d a t a .R04 01 runOLS . sasREADME. t x t

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 54: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Naming

Better

01 readBLS .R02 readCensus .R03 prepareCensus .R04 c r e a t e a n a l y s i s d a t a .R05 runOLS . sas

Even better

01 01 readBLS .R02 01 readCensus .R02 02 prepareCensus .R03 0 1 c r e a t e a n a l y s i s d a t a .R04 01 runOLS . sasREADME. t x t

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 55: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Naming

Going overboard?i c f / c t r l p r o g s / c o n t r o l i c f . sasi c f / c t r l p r o g s / pa ramete rs i c f . sasi c f / l i b r a r y / macros / i c f c l e a n u p . sasi c f / l i b r a r y / macros / i c f i m p u t e c o u n t y r e s . sasi c f / l i b r a r y / macros / l i c f f i n d n u m . sasi c f / l i b r a r y / macros / l i c f p r o x y . sasi c f / l i b r a r y / macros / l i c f s t a r s 1 . sasi c f / l i b r a r y / macros / l i c f t g r l a t l o n g s . sasi c f / l i b r a r y / sasprogs /01 i c f q a . sasi c f / l i b r a r y / sasprogs /01 i c f . sasi c f / l i b r a r y / sasprogs /02 i c f q a . sasi c f / l i b r a r y / sasprogs /02 i c f . sasi c f / l i b r a r y / sasprogs /03 i c f q a . sasi c f / l i b r a r y / sasprogs /03 i c f . sas[ sn ip ]i c f / l i b r a r y / sasprogs /19 i c f . sas

ehf / c t r l p r o g s / c o n t r o l e h f . sasehf / l i b r a r y / macros / read b ls . sasehf / l i b r a r y / sasprogs /01 eh f . sas[ sn ip ]

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 56: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Naming

Going overboard?i c f / c t r l p r o g s / c o n t r o l i c f . sasi c f / c t r l p r o g s / pa ramete rs i c f . sasi c f / l i b r a r y / macros / i c f c l e a n u p . sasi c f / l i b r a r y / macros / i c f i m p u t e c o u n t y r e s . sasi c f / l i b r a r y / macros / l i c f f i n d n u m . sasi c f / l i b r a r y / macros / l i c f p r o x y . sasi c f / l i b r a r y / macros / l i c f s t a r s 1 . sasi c f / l i b r a r y / macros / l i c f t g r l a t l o n g s . sasi c f / l i b r a r y / sasprogs /01 i c f q a . sasi c f / l i b r a r y / sasprogs /01 i c f . sasi c f / l i b r a r y / sasprogs /02 i c f q a . sasi c f / l i b r a r y / sasprogs /02 i c f . sasi c f / l i b r a r y / sasprogs /03 i c f q a . sasi c f / l i b r a r y / sasprogs /03 i c f . sas[ sn ip ]i c f / l i b r a r y / sasprogs /19 i c f . sas

ehf / c t r l p r o g s / c o n t r o l e h f . sasehf / l i b r a r y / macros / read b ls . sasehf / l i b r a r y / sasprogs /01 eh f . sas[ sn ip ]

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 57: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Naming

With minor modificationi c f / c t r l p r o g s / c o n t r o l i c f . sasi c f / c t r l p r o g s / pa ramete rs i c f . sasi c f / l i b r a r y / macros / i c f c l e a n u p . sasi c f / l i b r a r y / macros / i c f i m p u t e c o u n t y r e s . sasi c f / l i b r a r y / macros / l i c f f i n d n u m . sasi c f / l i b r a r y / macros / l i c f p r o x y . sasi c f / l i b r a r y / macros / l i c f s t a r s 1 . sasi c f / l i b r a r y / macros / l i c f t g r l a t l o n g s . sasi c f / l i b r a r y / sasprogs /01 i c f . sasi c f / l i b r a r y / sasprogs /02 i c f . sasi c f / l i b r a r y / sasprogs /03 i c f . sas[ sn ip ]i c f / l i b r a r y / sasprogs /19 i c f . sasi c f / l i b r a r y / sasprogs /01 i c f q a . sasi c f / l i b r a r y / sasprogs /02 i c f q a . sasi c f / l i b r a r y / sasprogs /03 i c f q a . sas

Can you figure out in what sequence to run them?

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 58: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Why SSH?

Most compute clusters have ONLY SSH accessIt is thus worthwhile to learn enough about it here, in order tobe functional there: CAC “Red Cloud”, Amazon Cloud, XSEDE.

Linux rules... the HPC worldAll 10 of the top 10 TOP500 computers run Linux (as thecompiler front-end, if not compute OS)

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 59: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Graphical access

Two types of graphical access

I with an “X server” (native in Linux, optional in Windowsand OSX)

→ standard way on most clustersI using NX client software for improved experience

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 60: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Graphical access

Two types of graphical access

I with an “X server” (native in Linux, optional in Windowsand OSX)→ standard way on most clusters

I using NX client software for improved experience

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 61: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Graphical access

Two types of graphical access

I with an “X server” (native in Linux, optional in Windowsand OSX)→ standard way on most clusters

I using NX client software for improved experience

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 62: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Basic Linux, basic scripting

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 63: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Why worry?

You will end up doing something on the command line

I Launch a program from a compute-cluster job

I Launch a job submissionI Basic scripting

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 64: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Why worry?

You will end up doing something on the command line

I Launch a program from a compute-cluster jobI Launch a job submission

I Basic scripting

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 65: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Why worry?

You will end up doing something on the command line

I Launch a program from a compute-cluster jobI Launch a job submissionI Basic scripting

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 66: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Linux in 2 minutes

I ls - will list the contents of a directoryI cd - will “change directory”I cd .. (note the spaces) will go up a directoryI cd (name) will go into the directory (name)I rm (name) will deleteI mkdir (name) will create a directory called (name)I vi (name) will open a venerable command line editor for file

(name)

(CAUTION: to exit, hit ESC, then :q!)

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 67: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Linux in 2 minutes

I ls - will list the contents of a directoryI cd - will “change directory”I cd .. (note the spaces) will go up a directoryI cd (name) will go into the directory (name)I rm (name) will deleteI mkdir (name) will create a directory called (name)I vi (name) will open a venerable command line editor for file

(name) (CAUTION: to exit, hit ESC, then :q!)

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 68: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Basic scripting in Linux

A basic loop on the command line

1 for ( ( i ; i <10; i ++ ) )2 do3 echo $ i4 done5 for i i n 1 3 7 996 do7 echo $ i8 done

Source: [1]

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 69: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Capturing output

You can capture the output from a command

> seq 1 3123

Now let’s use that:

f o r i i n $ ( seq 1 3)do

echo $ idone

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 70: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Basic scripting in Linux

Use for practical thingsRemember that ICF program sequence? How would we goabout starting 19 programs in sequence?

f o r program i n $ ( l s ∗ i c f . sas )do

sas $programdone

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 71: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Advanced linux in 2 minutes

The gateway to everythingmanor try http://www.linuxmanpages.com or http://linux.die.net/man/

The toolkitI sedI grepI awkI regex (regular expressions)

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 72: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Advanced scripting in Linux

Use for practical thingsWhat if I’m running 100s of programs, and trying to figure out ifany of them have errors?

f o r l o g f i l e s i n $ ( l s ∗ i c f . log )do

grep ERROR $ l o g f i l e sdone

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 73: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Now let’s try it out

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 74: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Next section

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists

Page 75: Workshop: High-performance computing for economists · Stampede (no. 6 on Top500 as of June 2013) RAM: 192 10243 kB, Speed: 2,700 Mhz 462,462 Source: TACC Vilhuber, Abowd, Mansfield,

Intro Basics VCS HPC resources

Next section

Vilhuber, Abowd, Mansfield, McKinney Computing for Economists