Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological...

Post on 26-Sep-2020

0 views 0 download

Transcript of Fast Primer Search with DUP · 2018. 12. 16. · Case Study: Fast Primer Search1 { Biological...

Christian Grothoff

Fast Primer Search with DUP

Christian Grothoffchristian@grothoff.org

Technische Universitat Munchen

1

Christian Grothoff

Overview

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search1

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results1“An algorithm for the comprehensive search of oligonucleotide signatures based

on phylogenetic trees”, joint work with K. Bader, W. Ludwig and H. Meier

2

Christian Grothoff

Problem Domains• Systems

• Programming Languages

• Software Engineering

• Secure Networking

• Privacy

RunaboutHashMap<Class,Code> mapvisitAppropriate(Object):void

RunaboutSumsum:intvisit(A0):voidvisit(A1):voidvisit(A2):void

Codeaccept(Object):void

GenCodeXX

0..*1

3

Christian Grothoff

The Problem:Developing Parallel Stream Applications

• Most developers (only) know how to write sequentialcode

• Parallel programing is error-prone (data races,deadlocks)

• High-performance parallel programming is really hard

• With GPUs for $4,000, we could have 2,600 cores...

⇒ Developers more expensive than hardware

4

Christian Grothoff

X10 vs. the DUP System2

X10

10x faster, 10x as productive in 10 years for BlueGene

DUP

12 the speed, 10x as productive in 10 months for POSIX

2Available at http://dupsystem.org/

5

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

6

Christian Grothoff

The DUP System

7

Christian Grothoff

DUP Applications

• Fast primer search

• High-throughput, customizable video-conferencing

• Parallel and distributed discrete event simulationframework

8

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

9

Christian Grothoff

Biological Questions

• Which are the most specific OSSs for a species?

• Which are the most specific OSSs for a subtree in thephylogenetic tree?

10

Christian Grothoff

Input: Phylogenetic Tree1

2

3

4

5

6

I

IIa

IIb

IIIa

IIIb

11

Christian Grothoff

Parallel Mapping of Primers to Speciess @opt1:88[0<in.txt,1|p1:0,3|p2:0] $ fanout;

p1@opt1:88[1|pe:0] $ arb_probe_dup;

p2@opt2:88[1|pe:3] $ arb_probe_dup;

pe@opt2:88[1>out.txt] $ gather;

0

2

4

6

0 2 4 6 8 10 12 14 16

Spe

ed-u

p

# Nodes

replicated,exactreplicated,approximate

partitioned,exactpartitioned,approximate

12

Christian Grothoff

Intermediate Result: Species and OSS

1

2

3

4

5

6

E

A

B

F

C

G

D

H

13

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

14

Christian Grothoff

Biological Questions

• Which are the most specific OSSs for a species?

• Which are the most specific OSSs for a subtree in thephylogenetic tree?

• Sequence information may contain errors; allow up to kout-group hits!

• Perfect OSS may not exist even with out-group hits;maximize number of in-group hits

15

Christian Grothoff

BGRT Creation (1/5)1,2 : A

1,2 : A

16

Christian Grothoff

BGRT Creation (2/5)2,3 : B

1,2 : A 1,2 : A

2,3 : B

17

Christian Grothoff

BGRT Creation (3/5)

1,2 : A

2,3 : B

1,3 : C

1

2,3 : B

3 : C

2 : A

18

Christian Grothoff

BGRT Creation (4/5)

1

2,3 : B

3 : C

2 : A

1,2,3 : D

1

2,3 : B

3 : C

2 : A

3 : D

19

Christian Grothoff

BGRT Creation (5/5)

1 : E

2 : A

3 : D

3 : C

2,3 : B

5 : G

4 : F

6 : H

20

Christian Grothoff

Iterate-And-Bound

1

2

3

4

5

6

I

IIa

IIb

IIIa

IIIb

I IIa IIb IIIa IIIb

3:0 2:1 1:2 1:2 0:3

I IIa IIb IIIa IIIb

2:0 2:0 0:2 0:2 0:2

I IIa IIb IIIa IIIb

1:0 1:0 0:1 0:1 0:1

I IIa IIb IIIa IIIb

2:0 1:1 1:1 1:1 0:2

I IIa IIb IIIa IIIb

2:0 1:1 1:1 1:1 0:2

I IIa IIb IIIa IIIb

3:0 1:2 2:1 1:2 1:2

I IIa IIb IIIa IIIb

1:0 0:1 1:0 1:0 0:1

I IIa IIb IIIa IIIb

2:0 0:2 2:0 1:1 1:1

1 : E

2 : A

3 : D

3 : C

2,3 : B

5 : G

4 : F

6 : H

21

Christian Grothoff

Final Result

PhaseI IIa 1 2 IIb IIIa 3 4 IIIb 5 6

Mis

mat

ches 0 3:D,G 2:A 1:E % 2:H 1:F % 1:F % % %

1 % 2:D+ 1:A,C+ 1:A,B 2:G+ 1:B,C,H+ 1:B,C 1:H+ 1:H % 1:H

2 % 1:G∗ 1:D+ 1:D,G+ 1:D∗ 1:D,G+ 1:D+ % 1:G+ 1:G %3 % % % % % % % % 0:D∗ % %

“+” Should probably not be computed (mismatches >, matches =)

“∗” Even more useless (mismatches >, matches <)

22

Christian Grothoff

Agenda

• Research Area

• Easy Distributed Stream Processing with DUP

• Case Study: Fast Primer Search

– Biological Questions

– Primer Search Parallelization with DUP

– Mapping Primers to Species with the BGRT

• Other Research Results

23

Christian Grothoff

Other Current Work

• Successful attacks on various “secure” P2P networks(Tor, Freenet, Tahoe LAFS, ...)

• Randomized Resilient Routing in Restricted RouteTopologies

• Autonomous NAT traversal

• Automatic Restart Management: RAS for GNU/Linux

• Parallelizing Protein Analysis (with Rost Lab)

24

Christian Grothoff

An Attack on Tor (USENIX Security 2009)

Client

Tor Node 3 - Our Exit Node

Server

Tor Node 1 - Unknown Node Malicious Client

Tor Node 2 - Known High BW Tor Node 1

High BW Tor Node 2 Malicious Server

25

Christian Grothoff

Future Work

• Resource allocation for DUP

• Parallelize more applications with DUP

• Aspect-oriented coordination language for DUP

• Migrating to IPv6 using secure P2P VPN over GNUnet

• Memory fragmentation analysis (Mallice)

26

Christian Grothoff

Questions

?

27

Christian Grothoff

The GNUnet Framework

ARM

Transport

TCP, UDP, HTTP, ...

Testing

Routing

Authoring

DatastoreEncryption

DV DHT

GAP

MySQL

sqLite

Postgres

28

Christian Grothoff

GNUnet’s Automatic Restart Manager

Bug Tracking System

DeveloperConfigurationUser

Bug

ARM

Dependency Learning

and Management

Process

Process

requires services

changes configuration

monitors

configuration

changes

starts

required

services

crashes

restarts

impacted

processes

reports

crash

investigates

bugs

analyzes crash,

adds instrumentation,

restarts serviceuses

dependency

29

Christian Grothoff

Secure Routing (1/3)

A

BC

D

E

FG

H

Base Restricted Route Topology

30

Christian Grothoff

Secure Routing (2/3)

A

BC

D

E

FG

H

Distance Vector Added Links

31

Christian Grothoff

Secure Routing (3/3)

000

001101

011

111100

010

110

Resulting DHT Topology

32

Christian Grothoff

Teaching

• Programming Languages

• Mainframe Administration

• Computer Networking

• Peer-to-Peer Networking

33

Christian Grothoff

Group InfrastructureSoftware:• Drupal / Mantis

• Doxygen / Subversion

• Buildbot / lcov

• clang (LLVM)

• Coverity Prevent

Hardware:• Server (i7)

• Sheeva Plug (ARM)

• iMac (PowerPC)

• GNU/Linux VMs (AMD)

• PMP-capable NAT router

Furthermore, we have access to a wide range of networking

equipment (via Lehrstuhl) and HPC facilities at the LRZ.

34