Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging...

35

Transcript of Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging...

Page 1: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher
Page 2: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Research

Life Cycle

Acquire

Plan

Analyse

Access Collaborate

Manage Archive

Publish Reuse

Data

HPC Cloud Virtual labs

Dataset transfer Databases Web-based file sharing Collaborative sites

Automated ingest and management

RDM support

Technical advice Costing Grant assistance

Comprehend

Visualisation facilities

Institutional repository

Page 3: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Emerging Researcher Series

What is eResearch?

Thursday, 9 February 2017

Research Liaison Jason van Rooyen, PhD

Page 4: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Overview

1. Research data and generators

2. Why eResearch?

3. Research challenges 3. Service catalogue

4. Summary

http://datablog.is.ed.ac.uk/files/2013/12/bitsissue8_2.png

Emerging Researcher Series #1 9 Feb 2017

Page 5: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Research Data and Generators

Emerging Researcher Series #1 9 Feb 2017

Page 6: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data origins

Emerging Researcher Series #1 9 Feb 2017

Page 7: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data role-players • Producers:

• Student/researcher • Facilities • Downstream and re-analyses

• Consumers:

• Lab/group • Collaborators • Discipline / Community

• Managers:

• IT • Data managers / libraries • Administration

• …….((((((Owners))))))……. researchers libraries

admin

collaborators

IT

data scientists

platforms/facilities

community

P

M C

Emerging Researcher Series #1 9 Feb 2017

Page 8: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data’s value to the community

• Intrinsic value: • Evidence • IP/commodity • Productivity metrics

• Drivers for sharing data:

• Validation • Re-use • Publicity

• Community and journal standards • Funding agency mandates • Innovation regulations

Top-down

Bottom-up Findable Accessible Interoperable Re-usable

DATA

Emerging Researcher Series #1 9 Feb 2017

Page 9: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Types of researchers • Researchers differ in:

• Resources • Skills and experiences • Risk appetites

“Rock Star” researchers

• Scarce • Well-funded • Prioritised • Large skilled teams • Early adopters / innovators • Risk takers • Networked

“Long Tail” researchers

• Under-resourced • Sometimes isolated • Cost ($ & minutes) / risk averse • Abundant

http://vignette1.wikia.nocookie.net/walkerjourn515/images/6/60/Rogers_adoption_curve_deaderick_version.png

Emerging Researcher Series #1 9 Feb 2017

Page 10: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Fields / Labs / Groups / Units all differ in capacity • Types of groups

• Academic groups / solitary PIs • Facilities • Programmes

• Staffing structures:

• Students • Research staff • IT/data engineers • Programmers

• Infrastructure differences

• Desktops • Server rooms

Jeannie T. Lee, M.D., Ph.D. Professor of Genetics and Pathology, Harvard Medical School

Lee Lab:

different needs

Emerging Researcher Series #1 9 Feb 2017

Page 11: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Why eResearch?

Emerging Researcher Series #1 9 Feb 2017

Page 12: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Why eReseach?

Pace and scale increasing

New tools and methodologies Internationalisation

Research Revolution

Emerging Researcher Series #1 9 Feb 2017

Page 13: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Who is eResearch UCT?

UCT eResearch partners with research groups to accelerate and transform research, connecting them to the most appropriate services to support the research

life cycle.

• New research strategy (2015-2025) • Research life cycle:

− Forecasting and grant

writing − Data collection − Analysis and computation − Publication − Data management − Sharing & collaboration − Profile-raising

Emerging Researcher Series #1 9 Feb 2017

Page 14: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

ICTS Engineers Research support Project management

Research Office: Communications

Libraries: Digital/Data services Digital scholarship

Emerging Researcher Series #1 9 Feb 2017

Page 15: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Research Challenges

Emerging Researcher Series #1 9 Feb 2017

Page 16: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Managing volumes - challenges at scale Lee Lab: • Movement

• from processing to storage

• Findability and recoverability • context (metadata)

• Privacy & access

• Infrastructure

• Local, central • Support, admin • Backup, security

• Education in best practise and tools

• Costs

• Consumer/enterprise, lifecycle

Emerging Researcher Series #1 9 Feb 2017

Page 17: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Managing volumes @ UCT Lee Lab:

0

50

100

150

200

250

300

350

400

2014/09/18 2014/12/27 2015/04/06 2015/07/15 2015/10/23

TB

• Sources: instruments, processing, collaborations

• 400 TB allocated

• Average allocated vs. provision ration 2:1

• Current rate 40 TB/m provisioned

• 90 TB fast parallel storage on HPC (fhgfs)

Uptake Rate

Storage Provisioned arceibo (74 TB)

CASA (74 TB)

SATVI (70 TB)

astronomy

Emerging Researcher Series #1 9 Feb 2017

Page 18: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Managing volumes – data deluge Lee Lab:

Field Technique Data rate Geomatics Laser scanning ~ 4 TB / year Neurosciences MRI ~ 5 TB / year

Biosciences Next-Gen Sequencing > 10 TB / year Biophysics Direct electron detectors TEMs > 200 TB / year Super-resolution microscopes > 1 PB / year

Emerging Researcher Series #1 9 Feb 2017

Page 19: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Sharing data - challenges Lee Lab: • Managing access permissions

• Who and how

• Internal vs. external collaborators

• Privacy and POPI act

• Small vs. large • Tools • Firewalls • Bandwidth

• Costs

• Bandwidth • Importance vs. other traffic

• Licence fees

//researchdata

Emerging Researcher Series #1 9 Feb 2017

Page 20: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Sharing data @ UCT Lee Lab: • Data is shared with:

• Project members • Collaborators

• Internal & external • Local & international

• Journals & repositories • Using array of tools

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Sep-14O

ct-14N

ov-14Dec-14Jan-15Feb-15M

ar-15Apr-15M

ay-15Jun-15Jul-15Aug-15Sep-15O

ct-15N

ov-15Dec-15Jan-16Feb-16M

ar-16Apr-16M

ay-16Jun-16Jul-16Aug-16Sep-16O

ct-16N

ov-16

Tera

byte

s Tra

nsfe

rred

Month

ARC Globus Endpoint

heinedej#ARC-Ubuntu

heinedej#H3ABioNet

heinedej#eResearchUCT

+ 140 TB to Wits

Emerging Researcher Series #1 9 Feb 2017

Page 21: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Efficient analyses - challenges Lee Lab:

• Appropriate infrastructure • Local, central, cloud

• Staff

• Time & skills • Hardware & systems • Training students

• Managing and storing processing results

• Standardizing workflows

• Resourcing / sustainability

• Costs for start-up • Lifecycle & upgrades • Seed funding

• Suitability of central compute resources

• Allocations, system, support

Emerging Researcher Series #1

9 Feb 2017

Page 22: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Efficient analyses @ UCT Lee Lab:

2011 2015

http://hpc.uct.ac.za/ • HPC @ UCT: • Inception in 2009 • 5-fold expansion to 1 450 cores (end

2013) -> exponential increase in usage • GPU servers • Community has consumed 12 million

compute hours

• VMs • ± 40

• ARC

• 15 compute nodes • 256GB of RAM per node • 360 processing cores • Over 400TB storage • NW - 500TB object storage

Emerging Researcher Series #1 9 Feb 2017

Page 23: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Archiving and preserving - challenges

• Deciding what to keep • triage –raw, processed, versions

• Metadata

• Which metadata to keep • How to keep associated with data

• Replication, vs. backup vs. archiving

• Best systems • Infrastructure

• Sustainability

• Ownership of data • Long-term costs

Emerging Researcher Series #1 9 Feb 2017

Page 24: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Publishing data – challenges Lee Lab: • Staying compliant :

• Agencies / owners • Deriving the most value for investors

• Deciding what to share

• Tension between competitiveness and

openness (patents)

• Where to put the data and how to fund it?

• Sharing large data

• Tracking impact, attribution, and proving compliance.

Emerging Researcher Series #1 9 Feb 2017

Page 25: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Supporting data-intensive research with ICT Lee Lab:

• Increased connectivity of

researchers • Security vs. convenience • Policy challenges (data

ownership)

• Sustainability • Charge model or subsidy • Brokerage (connecting to

competitors) • Seed funding

Imag: permabit.com/data-affordability-gap/

Emerging Researcher Series #1 9 Feb 2017

Page 26: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Ideal Services for a Modern University

Emerging Researcher Series #1 9 Feb 2017

Page 27: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data management and planning Lee Lab:

• Planning assistance

• Costing • DMP team and tool • Funder guidelines • Data policy

• Acquisition / ingest

• Tools (iRods/MyTardis) • Support

• Training

• Compliance monitoring

• eRA integration

• Institutional repositories

• Collaboration spaces

Emerging Researcher Series #1 9 Feb 2017

Page 28: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data storage Lee Lab:

• Convenient

• User / IT • Easily accessible • Shareable

• Applicable

• Fast HPC • Archival • Open

• Secure • Backups • Private

• Scalable

• Tiered storage

• Affordable • Graduated costs

Emerging Researcher Series #1 9 Feb 2017

Page 29: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data movement Lee Lab:

• Intuitive

• Non-sysadmins

• Scale appropriate

• Convenient • One-sided?

• Optimized transfers

• DMZs • Scheduled

• Performance monitoring

• Sustainable service

• Impact on network • Costs

Wikipedia

Emerging Researcher Series #1 9 Feb 2017

Page 30: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data analysis • Suitable

• Scale (cores & memory) • Flexible (efficiency) • Fit for purpose (service, HPC, big

data)

• Supported • Admin • Porting • Teaching

• Permit learning

• Free allocations • Suitable rights

• Integrated

• Storage • Sharing services

• Governed • Flexible • Transparent • Accommodation for

collaborations and groups

• Shareable • Common workflows • Group permissions

Emerging Researcher Series #1 9 Feb 2017

Page 31: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Enabling Open Data

Lee Lab:

Emerging Researcher Series #1 9 Feb 2017

• Assistance with research data management (RDM):

• RDM policy • Funder guidelines • DMPonline • Guidelines for depositing data • Guidelines for sharing data

• Implementation of preservation infrastructure

• Preservation of research data via UCT Libraries preservation infrastructure, Archivematica

• Storage of research data via storage facilities at ICTS • Dissemination, access and reuse via UCT online repository

• Institutional repository

Page 32: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Training and education Lee Lab:

• Data science

• HPC • Data analytics courses • Data carpentry • Digital humanities

• Data management

• Library carpentry

• Scientific software development • Software carpentry

• Sysadmins

• Research IT

• Storage, network, compute, cloud

Emerging Researcher Series #1 9 Feb 2017

Page 33: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Data visualisation

• Interrogation • Scale &

resolution • Immersion & 3D

• Collaboration

• Outreach

VR

Visualisation wall

Digital Dome

Emerging Researcher Series #1 9 Feb 2017

Page 34: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Summary Why eResearch?

To accelerate outputs and competitiveness in support of UCT’s research agenda

How do I get hold of eResearch? [email protected] or www.eresearch.uct.ac.za

What do eResearch services cost? Our cost model is available on the website at: http://www.eresearch.uct.ac.za/billing-model

Can staff and students both make use of eResearch services? Absolutely, if you are a researcher you can work with eResearch

Do you work with individual researchers or only communities? We prefer to work with communities of researchers because in this way our efforts have the greatest impact for the least cost

Do you work with Humanities and Social Sciences, or only with Sciences? We are happy to assist any researcher

Emerging Researcher Series #1 9 Feb 2017

Page 35: Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging Researcher Series #1 9 Feb 2017 . Enabling Open Data . Lee Lab: Emerging Researcher

Questions ?

Emerging Researcher Series #1 9 Feb 2017

https://tinyurl.com/ztoug6s www.eresearch.uct.ac.za