Research Code Andrew Rosenberg with RA Manual: Notes on Writing Code by Matthew Gentzkow and Jesse...
-
Upload
neil-mathews -
Category
Documents
-
view
218 -
download
0
Transcript of Research Code Andrew Rosenberg with RA Manual: Notes on Writing Code by Matthew Gentzkow and Jesse...
Research Code
Andrew Rosenberg
with
RA Manual: Notes on Writing Code by Matthew Gentzkow and Jesse Shapiro Chicago Booth
and
http://betterexplained.com/articles/a-visual-guide-to-version-control/
Research Code has a Bad Reputation
• Research coding is not done with the purpose of being robust, or reusable, or long-lived in development and versioning repositories.
• It is usually the code’s writer who is the consumer, or in some cases a few others in the lab.
• http://bytesizebio.net/index.php/2012/08/24/can-we-make-research-software-accountable/
Mistakes (Research) Programmers Make
• I just need to do this specific thing one time.
Mistakes (Research) Programmers Make
• I’ll remember what I did, if I need to do it again.
Mistakes (Research) Programmers Make
• No one is interested in this code.
Mistakes (Research) Programmers Make
• No one will ever see this code.
What research code looks like
• This is not application development.
• Often research code involves:– A series of small scripts,– linking together existing open source
toolkits,– reformatting input and output,– generating plots and graphs.
• Where is the “software”?
What research code looks like
• The contribution of the paper may be– Extension of an existing codebase– a set of small scripts and reformatting
one-liners.– implemented in multiple languages.
A new way of doing business
• These are bad excuses.
• There is movement to encourage and incentivize the distribution of source code with publications.
• And facilities to encourage it.
Source Code dissemination
• Host it yourself.
• www.runmycode.org
• http://www.ipol.im/
• (many, many more)
What is good enough?
• Right now:– ANYTHING.
• Ideally:– “production level” Code that can be run
or compiled on a standard configuration.– Thorough documentation.
Intellectual Property and Licensing
• GPL – copyleft
• Apache• many many more
• You have copyright over your code.• A license allows someone else to use it.
• Disclosures can limit your ability to patent.
Version Control
• Version control allows multiple users to edit the same content.
• Allows for coding in the open.• subversion, git, many more.
Version Control
Version Control
Version Control
Version Control
Version Control
Coding for the User
• Code for your future self.
• You are your most important user.
Don’t try to be clever
• Write simple, understandable code.
• Efficiency in number of lines is not important.
• Efficiency in number of operations or memory also might not be important.
There are many ways to skin a cat
print “Just another Perl hacker,”;
$_='987;s/^(d+)/$1-1/e;$1?eval:print"Just another Perl hacker,"';eval;
$_ = "wftedskaebjgdpjgidbsmnjgc";tr/a-z/oh, turtleneck Phrase Jar!/; print;
Establish a coding style.
• ClassName• nameMethodsUsingVerbs• underscored_lowercase_variable_names• CONSTANTS
• Spacing– x_mean=x_total/n– x_mean = x_total / n
• More than anything, be consistent
Testing
• Unit tests.– Small pieces of code that test “atomic”
functionality of a program.
void testAddWorksCorrectly() {assertEquals(4, add(2,2));
}
void testConstructorInitializesNameFieldToDefault() {Person p = new Person()assertEquals(“John Smith”, p.getName());
}
Why write tests?
• Identify problems.
• Easier Changes.
• Simple integration.
• Documentation.
Test Driven Development
• Write a Test• Run tests to see if it fails• Write as little code as possible• Make the tests pass (go green)• Refactor code• Repeat
[wikipedia]
Bug fixes and Testing
• When you find a bug in your code.
• Write a test that “catches the bug”.– It fails.
• The bug is fixed when the test passes.
• And it’ll never happen again.
Refactoring
• Just because code works, it doesn’t mean it’s done.
• Consolidate code to increase modularity– Eliminate code duplication.
• Some examples– Extract Classes– Extract Method– Move/Rename Method
Code Review
• Give your code to another person for feedback.
• Companies do this to ensure consistent style and correctness.
• Research labs rarely do.
Some specific advice.
• Take an enormous amount of notes.
– What did you do?– What did you learn?– What bugs did you fix?– What new issues did you find?– What questions did you come up with?
Specifics
• Copy and Paste is your enemy.– If you are copying and pasting in code,
you have probably made a mistake.
Specifics
• Use CONSTANTS– Never encode constants inline in your
code.
mean_height = total_height / 15
num_people = 13mean_height = total_height / num_people
Specifics
• Use CONSTANTS– Never encode constants inline in your
code.
data[17] = ‘Andrew’data[18] = 1.78
name_idx = 17score_idx = 18data[name_idx] = ‘Andrew’data[score_idx] = 1.78
Specifics
• Don’t use global variables
Specifics
• Use sensible function names
start()step1()step2()step3()wrapup()
Specifics
• Use sensible function names
initializeParameters()setPaths()calculateRHS()calculateLHS()writeResults()
Specifics
• Use sensible variable names
x1 = income / populationipc = income / populationincome_per_capita = income / population
Specifics
• Serialize Frequently.
main() {preprocessData()extractFeatures()runBaselineExperiment()runNewExperiment()evaluateResults()
}
Specifics
• Serialize Frequently.
preprocess files.data > clean_files.dataextractFeatures clean_files.data > features.csvrunBaseline features.csv > baseline.resultsrunNewExperiment features.csv > new.resultsevaluate baseline.results > baseline.reportevaluate new.results > new.report
Specifics
• When things get slow, use a profiler.– Identify slow functions, and fix them.– Some code needs to do a lot, so it can
be slow
Recap
• Research Code should be released– This is becoming more common,
expected and, sometimes, required.
• Research Code needs to be good code.– So you can reuse it.– So you can release it.