Status of the vector transport prototype Andrei Gheata 12/12/12.

Status of the vector transport prototype

Andrei Gheata12/12/12

Current implementation

transp

or

t

pick-upbaskets

transportable baskets

recycled baskets

full track collections

recycled track collections

Wor

ker t

hrea

ds

Dis

patc

h &

gar

bage

co

llect

thre

ad

Crossing tracks (itrack, ivolume)

Push/replacecollection

Main scheduler

0

1

2

3

4

5

6

7

8

n

Inject priority baskets

recyclebasket

ivolume

loop tra

cks and p

ush

to

baske

ts

0

1

2

3

4

5

6

7

8

n

Stepping(tid, &tracks)

Dig

itize

& I/

O th

read

Prio

rity

baske

ts

Generate(Nevents)

Hits

Hits

Digitize(iev)

Disk

Inject/replace baskets

deque

deque

gene

rate

flush

Disk

Current prototype

Extending the transport data flow

• Bottlenecks in scheduling ?• Other type of work and resources, sharing the

concurrency model ?

Scheduler Baskets with tracks

Crossing tracks

Hit blocks

dequePriority events

Transport() Digitize(block)

ProcessHits(vector)

Ev 0

Ev 1

Ev 2

Ev n

Digits data

I/O buffers

Ev 0

Ev 1

Ev 2

Ev n

Buffered events

Runnable and executor (Java)

• A simple task concurrency model based on a Run() interface• Single queue management for all different processing tasks

– Minimizing overheads of work balancing– Priority management at the level of the work queue

• In practice, our runnables are transport, digitization, I/O, …– Lower level splitting possible: geometry, physics processes, …

• Flow = Runnable producing other runnable that can be processed independently• Further improvements:

– Scheduling executed by worker threads (no need for separate scheduler)– Workers busy -> same thread processing its own runnable result(s)

Runnable

Data<Type>

Run()

Executor

RunnableRunnable

Runnable

FutureFuture

Future

Concurrent queue <Runnable>

Task

GPU-friendly tasks

• Task = code with clearly defined input/output data which can be executed in a flow• Independent GPU tasks

– Fully mapped on GPU• Mixed CPU/GPU tasks

– GPU kernel result is blocking for the CPU code• How to deal with the blocking part which has the overhead of the memory bus latency ?

CPU code

CPU code

GPU kernel

CPU code

CPU code

GPU kernel

CPU code

GPU kernel

Run() Run()

Idle CPU threads

CPU thread pool

Scheduling work for GPU-like resouces

• Resource pool of idle threads, controlled by a CPU broker (messaging wake-up)– CPU broker policy: resource balancing (N cores -> N active threads)

• Some of the runnables are “GPU friendly”– i.e. contain a part in the Run() processing having both CPU and GPU implementations

• CPU thread taking GPU friendly runnable -> ask GPU broker if resources available– If yes scatter work and push to GPU, then thread goes to wait/notify, else just run on the CPU– … Not before notifying CPU resource broker who may decide to wake up a thread from the pool

• When result comes back from GPU, thread resumes processing• At the end of a runnable cycle, CPU broker corrects the workload• Keep both CPU and GPU busy, avoiding hyperthreading

Runnable queue

Sleep/Wake-up

Active CPU threads

GPU work embedded

GPU broker Scatter/Gather

Low latency

Push/Sleep

Notify()

Resume

CPU broker

Status of the vector transport prototype Andrei Gheata 12/12/12.

Documents

Transcript of Status of the vector transport prototype Andrei Gheata 12/12/12.