CLIR Fellows - Science Data - 14_0730

44
https://twitter.com/cory_foy/status/493759969045409792/photo/1

description

Presentation on July 30, 2014 to CLIR Post-Doctoral Fellows at Bryn Mawr College.

Transcript of CLIR Fellows - Science Data - 14_0730

Page 1: CLIR Fellows - Science Data - 14_0730

https://twitter.com/cory_foy/status/493759969045409792/photo/1

Page 2: CLIR Fellows - Science Data - 14_0730

On Science Research Data

Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator

Columbia University Libraries

[email protected] @j_lancaster

Page 3: CLIR Fellows - Science Data - 14_0730

How do you feel about being ever-increasingly bombarded

by data?

Page 4: CLIR Fellows - Science Data - 14_0730

On Science Research Data

Page 5: CLIR Fellows - Science Data - 14_0730

On Science Research Data

Computer Science Chemical Biological Geological

Engineering Medical

Psychological Mathematical

Physical Astronomical

Etc.  

Page 6: CLIR Fellows - Science Data - 14_0730

Sciences Humanities Social Sciences

<1990’s

Text Census Data

Experimental Data

Highly structured Wild West, y’all

Page 7: CLIR Fellows - Science Data - 14_0730

Sciences Humanities

Social Sciences

>2010’s

Text

Census

Exp. Data

Digital Humanities

Regression Analysis

Data Mining

Code

Page 8: CLIR Fellows - Science Data - 14_0730

Practices in science research drive institutional approaches to

research support.

Why?

Page 9: CLIR Fellows - Science Data - 14_0730

Data Lifecycle

Page 10: CLIR Fellows - Science Data - 14_0730

Data Lifecycle

Data Management Planning Formats, Metadata Storage Funding Etc.mat  

Page 11: CLIR Fellows - Science Data - 14_0730

Data Lifecycle

Original Data Collected Data  

Page 12: CLIR Fellows - Science Data - 14_0730

Data Lifecycle

Formats, Metadata Software Methodology  

Page 13: CLIR Fellows - Science Data - 14_0730

Data Lifecycle

Publishing Copyright Intellectual Prop. Open Access Repositories (Alt)metrics

Page 14: CLIR Fellows - Science Data - 14_0730

Professor Nick Turro

(1938-2012)

Page 15: CLIR Fellows - Science Data - 14_0730

Applica*ons?  Plausible?  

Useful?  Novel?  

New  Ques*ons  

New  Knowledge  

Patents  

Background  Jus*fica*on  

Conferences   Community  Conversa*on  

Data  Analysis  

Confirma*on  Reagents  

Protocols  

Learning  Up  to  Speed  

Researchers  

Students  

Jus*fica*on  

Grants  

JOB  &  FUNDING  

PUBLISH  

PEOPLE  

FUNDING  

ANALYSIS  &  RESULTS   EXPERIMENTS  

RESEARCH  PLAN  

IDEA  Big  or  small  

Discussion  

Conferences  Talks  

Ar*cles  

Thesis  

Talks  

The Research Workflow

Adapted  from  Laura  Cro/  @  Nature  

Page 16: CLIR Fellows - Science Data - 14_0730

Baseline: What’s the minimum you need to know about a field/subject

to be helpful?

Page 17: CLIR Fellows - Science Data - 14_0730

Case Study 1: I’m on a Boat!

R/V Marcus G. Langseth

Page 18: CLIR Fellows - Science Data - 14_0730

Case Study 1: I’m on a Boat! MCS Acquisition

Syntrak 960-24 SSI Seisnet active tape emulation

Hydrophone arrays

Sentry solid cable 12.5 meter groups 150m sections up to four towed separation 50 - 150 meters

Source Arrays

4 x 10 gun strings 9 active, one spare / string 15 meter string length 1650 cu. In. per string

Source Controller

DigiShot

MCS geometry sensors Digicourse 5011 Compassbirds Digicourse Digirange Tailbuoy GPS Source GPS (1 per string)

MCS Navigation

Concept Systems, Ltd Spectra, Sprint, Reflex

MCS QC

Syntrak SeisNet ProMaxx Focus

Communications

HighSeasNet Inmarsat Sailor 500 FleetBroadband Iridium Sailor Satellite Phone

Multibeam / Echsounder Kongsberg EM122 1° x 1° Knudsen 3260 Echosounder

Marine Mammals Observation/ Mitigation

Seiche Passive Acoustic Monitoring Streamer 2 x Fujinon Big Eye Binoculars

General

Bell BGM-3 Gravimeter Geometrics 882 Magnetometer RDI 75KHz ADCP Stbd Side A frame Telescoping Stern Boom Sippican Mk21 Expendable Probe Launcher Teflon-lined Uncontaminated Seawater System Seabird SBE21 Thermosalinograph LDEO PCO2 RM Young Weather Station

Page 19: CLIR Fellows - Science Data - 14_0730

Activity: Spreadsheets

What do you observe about the data?

Can you describe the experiment that was being done?

What did the researcher do well?

What can be improved in how the data is kept/shared?

Page 20: CLIR Fellows - Science Data - 14_0730

Case Study 2: Breaking Bad

Page 21: CLIR Fellows - Science Data - 14_0730

Activity: Lab Notebooks

What do you observe about the data?

Can you describe the experiment that was being done?

What did the researcher do well?

What can be improved in how the data is kept/shared?

Page 22: CLIR Fellows - Science Data - 14_0730

Case Study 3: Needle in a Haystack

http://core-genomics.blogspot.com/2012/05/resources-for-public-understanding-of.html

Page 23: CLIR Fellows - Science Data - 14_0730

Big Data +

Data Science

CERN: approx. 1 PB/sec = 1000 TB/sec = 1000000 GB/sec

filtered to 1 GB/sec

http://arstechnica.com/science/2010/08/lhc-computing-grid-pushes-petabytes-of-data-beats-expectations/

Page 24: CLIR Fellows - Science Data - 14_0730

Big Data +

Data Science

Institute for Data Sciences & Engineering: •  Cybersecurity Center •  Financial and Business Analytics Center •  Foundations of Data Science Center •  Health Analytics Center •  New Media Center •  Smart Cities Center

Page 25: CLIR Fellows - Science Data - 14_0730

Conversation: Code

What is special about code?

What do you need to know to help a patron code?

What are best practices for code use?

Could you find out the most used bits of code in 2014?

Page 26: CLIR Fellows - Science Data - 14_0730

Some disciplines have repositories. Some don’t.

Some institutions have repositories. Some don’t.

Page 27: CLIR Fellows - Science Data - 14_0730

figshare.com Share  research  components  to  make  them  discoverable  &  citable;  get  metrics  

Page 28: CLIR Fellows - Science Data - 14_0730

DOIs for Code

Github + Mozilla + Figshare à DOIs Mozilla: Software Carpentry

Page 29: CLIR Fellows - Science Data - 14_0730

Future: Electronic Lab Notebooks

Page 30: CLIR Fellows - Science Data - 14_0730

Science metadata depends on the discipline.

Sort of.

Page 31: CLIR Fellows - Science Data - 14_0730

Some Problems with Science (Data)

Page 32: CLIR Fellows - Science Data - 14_0730

Scientists are lazy.

But: They’ll do what funders tell them to.

Page 33: CLIR Fellows - Science Data - 14_0730

Crowd-Funding Science Funding  science  may  no  longer  rely  upon  government.  Interested  people,  engaged  by  social  media  presence,  are  key  to  raising  money  from  the  crowd.  

Page 34: CLIR Fellows - Science Data - 14_0730

Reproducibility Initiative Address  the  reproducibility  of  your  research  in  a  blind,  fee-­‐for-­‐service  validaFon  

Validated  studies  receive  a  Cer*ficate  of  Reproducibility  acknowledging  that  their  results  have  been  independently  reproduced.  

Page 35: CLIR Fellows - Science Data - 14_0730

(Some) Scientists are private.

About some things. And some scientists are not.

Page 36: CLIR Fellows - Science Data - 14_0730

ORCID, ResearcherID, etc. Unique  idenFfiers  for  researchers  to  cross-­‐reference  publicaFons,  acFviFes,  etc.  

John  Smith  vs.  J.  Smith  vs.  John  D.  Smith  vs.    J.  D.  Smith  vs.  JD  Smith  vs.  …  

Wang  Kim  vs.  W.  Kim  vs.  Kim  Wang  vs.  K.  Wang  …  

ORCID:  0000-­‐0003-­‐0458-­‐2127  

ResearcherID:  J-­‐6870-­‐2012  

Page 37: CLIR Fellows - Science Data - 14_0730

Sharing doesn’t count.

Until now.

Page 38: CLIR Fellows - Science Data - 14_0730

Run My Code Share  code  used  to  analyze  data;  others  can  implement  the  same  methodology    

•  Biology  •  Mathema*cs  •  Neuroscience  •  Sta*s*cs  •  Social  sciences  

•  Economics  •  Econometrics  •  Finance  •  Management  

•  R  •  MATLAB©  •  C++  •  Fortran  •  Rats  •  More  sodware  will  be  

added  soon.  

Oh,  and  it’s  free!  

Page 39: CLIR Fellows - Science Data - 14_0730

Altmetric(s) Capture  overall  impact  of  a  publicaFon  in  blogs,  tweets,  menFons,  news,  etc.  

Page 40: CLIR Fellows - Science Data - 14_0730

Applica*ons?  Plausible?  

Useful?  Novel?  

New  Ques*ons  

New  Knowledge  

Patents  

Background  Jus*fica*on  

Conferences   Community  Conversa*on  

Data  Analysis  

Confirma*on  Reagents  

Protocols  

Learning  Up  to  Speed  

Researchers  

Students  

Jus*fica*on  

Grants  

JOB  &  FUNDING  

PUBLISH  

PEOPLE  

FUNDING  

ANALYSIS  &  RESULTS   EXPERIMENTS  

RESEARCH  PLAN  

IDEA  Big  or  small  

Discussion  

Conferences  Talks  

Ar*cles  

Thesis  

Talks  

So. Many. Tools.

Page 41: CLIR Fellows - Science Data - 14_0730

Digital Science

Page 42: CLIR Fellows - Science Data - 14_0730

Digital Science Librarian

Page 43: CLIR Fellows - Science Data - 14_0730

Questions?

Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator

Columbia University Libraries

[email protected] @j_lancaster

http://www.slideshare.net/jeffreylancaster/

Page 44: CLIR Fellows - Science Data - 14_0730

Science @ Columbia

columbiascience.tumblr.com