Status of the vector transport prototype Andrei Gheata 12/12/12.
-
Upload
lindsay-dixon -
Category
Documents
-
view
216 -
download
4
Transcript of Status of the vector transport prototype Andrei Gheata 12/12/12.
![Page 1: Status of the vector transport prototype Andrei Gheata 12/12/12.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ed55503460f94be5fe1/html5/thumbnails/1.jpg)
Status of the vector transport prototype
Andrei Gheata12/12/12
![Page 2: Status of the vector transport prototype Andrei Gheata 12/12/12.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ed55503460f94be5fe1/html5/thumbnails/2.jpg)
Current implementation
transp
or
t
pick-upbaskets
transportable baskets
recycled baskets
full track collections
recycled track collections
Wor
ker t
hrea
ds
Dis
patc
h &
gar
bage
co
llect
thre
ad
Crossing tracks (itrack, ivolume)
Push/replacecollection
Main scheduler
0
1
2
3
4
5
6
7
8
n
Inject priority baskets
recyclebasket
ivolume
loop tra
cks and p
ush
to
baske
ts
0
1
2
3
4
5
6
7
8
n
Stepping(tid, &tracks)
Dig
itize
& I/
O th
read
Prio
rity
baske
ts
Generate(Nevents)
Hits
Hits
Digitize(iev)
Disk
Inject/replace baskets
deque
deque
gene
rate
flush
![Page 3: Status of the vector transport prototype Andrei Gheata 12/12/12.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ed55503460f94be5fe1/html5/thumbnails/3.jpg)
Disk
Current prototype
Extending the transport data flow
• Bottlenecks in scheduling ?• Other type of work and resources, sharing the
concurrency model ?
Scheduler Baskets with tracks
Crossing tracks
Hit blocks
dequePriority events
Transport() Digitize(block)
ProcessHits(vector)
Ev 0
Ev 1
Ev 2
Ev n
Digits data
I/O buffers
Ev 0
Ev 1
Ev 2
Ev n
Buffered events
![Page 4: Status of the vector transport prototype Andrei Gheata 12/12/12.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ed55503460f94be5fe1/html5/thumbnails/4.jpg)
Runnable and executor (Java)
• A simple task concurrency model based on a Run() interface• Single queue management for all different processing tasks
– Minimizing overheads of work balancing– Priority management at the level of the work queue
• In practice, our runnables are transport, digitization, I/O, …– Lower level splitting possible: geometry, physics processes, …
• Flow = Runnable producing other runnable that can be processed independently• Further improvements:
– Scheduling executed by worker threads (no need for separate scheduler)– Workers busy -> same thread processing its own runnable result(s)
Runnable
Data<Type>
Run()
Executor
RunnableRunnable
Runnable
FutureFuture
Future
Concurrent queue <Runnable>
Task
![Page 5: Status of the vector transport prototype Andrei Gheata 12/12/12.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ed55503460f94be5fe1/html5/thumbnails/5.jpg)
GPU-friendly tasks
• Task = code with clearly defined input/output data which can be executed in a flow• Independent GPU tasks
– Fully mapped on GPU• Mixed CPU/GPU tasks
– GPU kernel result is blocking for the CPU code• How to deal with the blocking part which has the overhead of the memory bus latency ?
CPU code
CPU code
GPU kernel
CPU code
CPU code
GPU kernel
CPU code
GPU kernel
Run() Run()
![Page 6: Status of the vector transport prototype Andrei Gheata 12/12/12.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ed55503460f94be5fe1/html5/thumbnails/6.jpg)
Idle CPU threads
CPU thread pool
Scheduling work for GPU-like resouces
• Resource pool of idle threads, controlled by a CPU broker (messaging wake-up)– CPU broker policy: resource balancing (N cores -> N active threads)
• Some of the runnables are “GPU friendly”– i.e. contain a part in the Run() processing having both CPU and GPU implementations
• CPU thread taking GPU friendly runnable -> ask GPU broker if resources available– If yes scatter work and push to GPU, then thread goes to wait/notify, else just run on the CPU– … Not before notifying CPU resource broker who may decide to wake up a thread from the pool
• When result comes back from GPU, thread resumes processing• At the end of a runnable cycle, CPU broker corrects the workload• Keep both CPU and GPU busy, avoiding hyperthreading
Runnable queue
Sleep/Wake-up
Active CPU threads
GPU work embedded
GPU broker Scatter/Gather
Low latency
Push/Sleep
Notify()
Resume
CPU broker