2014 05-27 - Opinion: Computing for genomics sucks.

17
Why computing for genomics research sucks. [email protected] BaltiBio 2014-05-27

description

Some thoughts on 1. why the genomics bioinformaticians need hardware that differs from what traditional HPC providers provide 2. With input from @bmpvieira, @yeban, @gawbul .

Transcript of 2014 05-27 - Opinion: Computing for genomics sucks.

Page 1: 2014 05-27 - Opinion: Computing for genomics sucks.

Why computing for genomics research sucks.

[email protected]

BaltiBio 2014-05-27

Page 2: 2014 05-27 - Opinion: Computing for genomics sucks.

Example Genomics Tasks

Repetitiveness “Disk” !Input/Output Memory Duration

per task

Build 10,000 trees 10,000x low low short

Trim FASTQ files 40-400x high low short

One de novo genome assembly 1 high high long

Many de novo genome assemblies 20-1000x high high long

Determine which of 10 new tools that

promise X can actually do X (once). !“genome hacking”

1 depends depends depends

Page 3: 2014 05-27 - Opinion: Computing for genomics sucks.

Traditional High Performance Computing (HPC)

• Physics? Astronomy? Maths? Chemistry?

• Traditional HPC infrastructures are great at small tasks:

Repetitiveness “Disk” !Input/Output Memory Duration

per task

Build 10,000 trees 10,000x low low short

• And/or have mechanisms/tools that transform their challenges into many small tasks.

Page 4: 2014 05-27 - Opinion: Computing for genomics sucks.

“We have 9999 cores!” - central IT admin

but they are inadequate

Page 5: 2014 05-27 - Opinion: Computing for genomics sucks.

Big Ass Servers• e.g.: 1.5 TB ram; 48 cores -

SSH into it and do whatever you want.

Repetitiveness “Disk” !Input/Output Memory Duration

per task

Build 10,000 trees 10,000x low low short

Trim FASTQ files 40-400x high low short

One de novo genome assembly 1 high high long

Many de novo genome assemblies 20-1000x high high long

Determine which of 10 new tools that promise

X can actually do X (once). !

1 depends depends depends

Jeremy Leipzig

Page 6: 2014 05-27 - Opinion: Computing for genomics sucks.

Additional challenges for biologists• Datasets continue growing fast!

• Generally:

• We lack computational training.

• Bioinformatics tools suck (badly written, badly tested, hard to install).

Page 7: 2014 05-27 - Opinion: Computing for genomics sucks.

So what do we need? • access to machines of all shapes and sizes

• big and small machines

• direct access via ssh (for hacking & doing things few times)

• indirect access via queue (for doing things many times)

• fast I/O - cheap archival.

• single login: all files “feel” like they’re in one place

Page 8: 2014 05-27 - Opinion: Computing for genomics sucks.

Swiss Institute of Bioinformatics: Vital-IT

Page 9: 2014 05-27 - Opinion: Computing for genomics sucks.

So what do we need? • access to machines of all shapes and sizes

• big and small machines

• direct access via ssh (for hacking & doing things few times)

• indirect access via queue (for doing things many times)

• fast I/O - cheap archival.

• single login; all files “feel” like they’re in one place

• easily changeable software & OS versions

Page 10: 2014 05-27 - Opinion: Computing for genomics sucks.

Easily changeable OS & software versions

https://www.docker.io

>docker-switch bio-linux7# do stuff >docker-switch pacbio-assembly-vm# do other stuff>docker-switch antlab-ubuntu# do more stuff

@bmpvieira

Page 11: 2014 05-27 - Opinion: Computing for genomics sucks.

Easily changeable OS & software versions

https://www.docker.io

>docker-switch bio-linux7# do stuff >docker-switch pacbio-assembly-vm# do other stuff>docker-switch antlab-ubuntu# do more stuff FAK

E@bmpvieira

Page 12: 2014 05-27 - Opinion: Computing for genomics sucks.
Page 13: 2014 05-27 - Opinion: Computing for genomics sucks.

What if Apple/Google made an idiot-proof cloud computing

system for genomics?

Page 14: 2014 05-27 - Opinion: Computing for genomics sucks.

What if Apple/Google made an idiot-proof cloud computing

system for genomics?• Always on - single place to connect to:

ssh mylab.awskiller.co.uk

• Dropbox-like shared directories & file checksumming.

• Easily switchable OS version / “VM”.

• Automagically & transparently migrates:• from small to huge machines (and back) as CPU and RAM

demands change.

Page 15: 2014 05-27 - Opinion: Computing for genomics sucks.

What if Apple/Google made an idiot-proof cloud computing

system for genomics?• Always on - single place to connect to:

ssh mylab.awskiller.co.uk

• Dropbox-like shared directories & file checksumming.

• Easily switchable OS version / “VM”.

• Automagically & transparently migrates:• from small to huge machines (and back) as CPU and RAM

demands change. • from one physical site (huge dataset) to another

Page 16: 2014 05-27 - Opinion: Computing for genomics sucks.

Summary• Broad range of needs:!

• some similar to traditional HPC.!• some very different!!

• Users are naive.!• Tools are experimental.!• Datasets are experimental.!• IT people have difficulty understanding this.

• Do not trust them when they say things will just work! !

• A lot of potential to make things not suck.

Page 17: 2014 05-27 - Opinion: Computing for genomics sucks.

Evolutionary Genetics group & Queen Mary U London

Bruno Vieira - @bmpvieira

Steve Moss - @gawbul

Anurag Priyam - @yeban

Richard Christie & ITS Research Support team @ Queen Mary U London

Ioannis Xenarios & Vital-IT team @ Swiss Institute of Bioinformatics

http://[email protected]