CLIR Fellows - Science Data - 14_0730
-
Upload
jeffreylancaster -
Category
Technology
-
view
300 -
download
0
description
Transcript of CLIR Fellows - Science Data - 14_0730
https://twitter.com/cory_foy/status/493759969045409792/photo/1
On Science Research Data
Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator
Columbia University Libraries
[email protected] @j_lancaster
How do you feel about being ever-increasingly bombarded
by data?
On Science Research Data
On Science Research Data
Computer Science Chemical Biological Geological
Engineering Medical
Psychological Mathematical
Physical Astronomical
Etc.
Sciences Humanities Social Sciences
<1990’s
Text Census Data
Experimental Data
Highly structured Wild West, y’all
Sciences Humanities
Social Sciences
>2010’s
Text
Census
Exp. Data
Digital Humanities
Regression Analysis
Data Mining
Code
Practices in science research drive institutional approaches to
research support.
Why?
Data Lifecycle
Data Lifecycle
Data Management Planning Formats, Metadata Storage Funding Etc.mat
Data Lifecycle
Original Data Collected Data
Data Lifecycle
Formats, Metadata Software Methodology
Data Lifecycle
Publishing Copyright Intellectual Prop. Open Access Repositories (Alt)metrics
Professor Nick Turro
(1938-2012)
Applica*ons? Plausible?
Useful? Novel?
New Ques*ons
New Knowledge
Patents
Background Jus*fica*on
Conferences Community Conversa*on
Data Analysis
Confirma*on Reagents
Protocols
Learning Up to Speed
Researchers
Students
Jus*fica*on
Grants
JOB & FUNDING
PUBLISH
PEOPLE
FUNDING
ANALYSIS & RESULTS EXPERIMENTS
RESEARCH PLAN
IDEA Big or small
Discussion
Conferences Talks
Ar*cles
Thesis
Talks
The Research Workflow
Adapted from Laura Cro/ @ Nature
Baseline: What’s the minimum you need to know about a field/subject
to be helpful?
Case Study 1: I’m on a Boat!
R/V Marcus G. Langseth
Case Study 1: I’m on a Boat! MCS Acquisition
Syntrak 960-24 SSI Seisnet active tape emulation
Hydrophone arrays
Sentry solid cable 12.5 meter groups 150m sections up to four towed separation 50 - 150 meters
Source Arrays
4 x 10 gun strings 9 active, one spare / string 15 meter string length 1650 cu. In. per string
Source Controller
DigiShot
MCS geometry sensors Digicourse 5011 Compassbirds Digicourse Digirange Tailbuoy GPS Source GPS (1 per string)
MCS Navigation
Concept Systems, Ltd Spectra, Sprint, Reflex
MCS QC
Syntrak SeisNet ProMaxx Focus
Communications
HighSeasNet Inmarsat Sailor 500 FleetBroadband Iridium Sailor Satellite Phone
Multibeam / Echsounder Kongsberg EM122 1° x 1° Knudsen 3260 Echosounder
Marine Mammals Observation/ Mitigation
Seiche Passive Acoustic Monitoring Streamer 2 x Fujinon Big Eye Binoculars
General
Bell BGM-3 Gravimeter Geometrics 882 Magnetometer RDI 75KHz ADCP Stbd Side A frame Telescoping Stern Boom Sippican Mk21 Expendable Probe Launcher Teflon-lined Uncontaminated Seawater System Seabird SBE21 Thermosalinograph LDEO PCO2 RM Young Weather Station
Activity: Spreadsheets
What do you observe about the data?
Can you describe the experiment that was being done?
What did the researcher do well?
What can be improved in how the data is kept/shared?
Case Study 2: Breaking Bad
Activity: Lab Notebooks
What do you observe about the data?
Can you describe the experiment that was being done?
What did the researcher do well?
What can be improved in how the data is kept/shared?
Case Study 3: Needle in a Haystack
http://core-genomics.blogspot.com/2012/05/resources-for-public-understanding-of.html
Big Data +
Data Science
CERN: approx. 1 PB/sec = 1000 TB/sec = 1000000 GB/sec
filtered to 1 GB/sec
http://arstechnica.com/science/2010/08/lhc-computing-grid-pushes-petabytes-of-data-beats-expectations/
Big Data +
Data Science
Institute for Data Sciences & Engineering: • Cybersecurity Center • Financial and Business Analytics Center • Foundations of Data Science Center • Health Analytics Center • New Media Center • Smart Cities Center
Conversation: Code
What is special about code?
What do you need to know to help a patron code?
What are best practices for code use?
Could you find out the most used bits of code in 2014?
Some disciplines have repositories. Some don’t.
Some institutions have repositories. Some don’t.
figshare.com Share research components to make them discoverable & citable; get metrics
DOIs for Code
Github + Mozilla + Figshare à DOIs Mozilla: Software Carpentry
Future: Electronic Lab Notebooks
Science metadata depends on the discipline.
Sort of.
Some Problems with Science (Data)
Scientists are lazy.
But: They’ll do what funders tell them to.
Crowd-Funding Science Funding science may no longer rely upon government. Interested people, engaged by social media presence, are key to raising money from the crowd.
Reproducibility Initiative Address the reproducibility of your research in a blind, fee-‐for-‐service validaFon
Validated studies receive a Cer*ficate of Reproducibility acknowledging that their results have been independently reproduced.
(Some) Scientists are private.
About some things. And some scientists are not.
ORCID, ResearcherID, etc. Unique idenFfiers for researchers to cross-‐reference publicaFons, acFviFes, etc.
John Smith vs. J. Smith vs. John D. Smith vs. J. D. Smith vs. JD Smith vs. …
Wang Kim vs. W. Kim vs. Kim Wang vs. K. Wang …
ORCID: 0000-‐0003-‐0458-‐2127
ResearcherID: J-‐6870-‐2012
Sharing doesn’t count.
Until now.
Run My Code Share code used to analyze data; others can implement the same methodology
• Biology • Mathema*cs • Neuroscience • Sta*s*cs • Social sciences
• Economics • Econometrics • Finance • Management
• R • MATLAB© • C++ • Fortran • Rats • More sodware will be
added soon.
Oh, and it’s free!
Altmetric(s) Capture overall impact of a publicaFon in blogs, tweets, menFons, news, etc.
Applica*ons? Plausible?
Useful? Novel?
New Ques*ons
New Knowledge
Patents
Background Jus*fica*on
Conferences Community Conversa*on
Data Analysis
Confirma*on Reagents
Protocols
Learning Up to Speed
Researchers
Students
Jus*fica*on
Grants
JOB & FUNDING
PUBLISH
PEOPLE
FUNDING
ANALYSIS & RESULTS EXPERIMENTS
RESEARCH PLAN
IDEA Big or small
Discussion
Conferences Talks
Ar*cles
Thesis
Talks
So. Many. Tools.
Digital Science
Digital Science Librarian
Questions?
Jeffrey Lancaster, Ph.D. Emerging Technologies Coordinator
Columbia University Libraries
[email protected] @j_lancaster
http://www.slideshare.net/jeffreylancaster/
Science @ Columbia
columbiascience.tumblr.com