tools for communicating in the computational sciences

Post on 23-Jan-2017

117 views 0 download

Transcript of tools for communicating in the computational sciences

tools for communicating in the computational sciences

Brian M. Bot | Senior Scientist | Sage Bionetworks

clearScience

14 December 2012

Sage Bionetworks

Sage Bionetworks

I love Sage notebooks!

a non-profit organization which pilots a variety of components that are necessary to build a scientific research “commons”

why?

Sage Bionetworks

“We Must Guard Against the acquisition of unwarranted influence,

whether sought or unsought, by the Military Industrial Complex”

- Dwight D. Eisenhower 1961 Medical

not conducive for a ‘commons’

institutional incrementalism

individual tenure

proprietary short term solutions

not conducive for a ‘commons’

commonsenabling a

open data

accessible platform

clear communication

“The problem is that right now, it’s not easy to donate your data to health research.”

“The goal of Consent to Research is to play a part in the transformation of health from

something we experience passively to something we

experience actively.”

http://weconsent.usJohn Wilbanks, Chief Commons Officer

open data

open data

accessible platform

clear communication

commonsenabling a

‣ compute ‣ hardware ‣ software

‣ data ‣ code

analysis environment

RESTful APIs

accessible platform

open data

accessible platform

clear communication

commonsenabling a

clear communication

Deception at Duke

research scandals represent merely the extreme of a continuum in the culture of academic research

the status quo tolerates poor communication of findings

6%

21%

8%

11%

54%cannot reproduce

can reproduce in principle

can reproduce w/discrepancies

can reproduce from processed data w/discrepancies

can reproduce partially

Ioannidis A. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149-155 (2009) | doi:10.1038/ng.295

208,294,724 datapoints

124 pages supplemental material

?? lines unobtainable source code

?? version or architecture of statistical analysis program (R)

enumerable R packages and package dependencies

key R package “ClaNC” no longer available

442 citations

often what is in principle reproducible, is not practically reproducible

unidentified publication‣ from journal with 5 year impact factor of 28‣ article freely available for download‣ data freely available for download

how are we to move science forward

if we cannot understand what was done previously?

let’s go back to basics

4. test hypothesis experimentally

5. analyze experimental data

7. publish results

6. draw conclusions based on data

scientific method1. define a question

2. gather information and resources (background research)

3. form a hypothesis

8. retest (frequently done by other scientists)

4. test hypothesis experimentally

5. analyze experimental data

7. publish results

6. draw conclusions based on data

7. publish results

finitein

∞...

submit to journal

analyze on local machine

write a documentsent to reviewers as pdf

printed on paper

static html representation

experimentally generate data

accepted & digitally typeset

static pdf representation

store on local server

are being artificially uncoupled from

scientific claims

science itself

is hardscience

is hardcommunication

(especially for scientists)

clearSciencere-imagining scientific communication

allow consumption of content at a variety of levels of complexity

and abstraction

leverage Synapse RESTful APIs

clearScienceallow consumption of content at a

variety of levels of complexity and abstraction

“hand the keys over” to the reviewers

scientific communicationneeds to evolve

along with scienceneeds to evolve

make it easy to do

good science

clearScience

make it easy to do

AcknowledgementsSage Bionetworks

David Burdick - Rockstar Engineer

Stephen Friend - President and CEO

Erich S. Huang - Director of Cancer Research

Mike Kellen - Director of Technology

External Partners

Myles Axton - Nature Genetics

Phil Bourne - PLoS Computational Biology

Josh Greenberg - Alfred P. Sloan Foundation

Kelly LaMarco - Science Translational Medicine

Eric Schadt - Mount Sinai School of Medicine