MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in...

73
MG/ME on GPU Junichi Kanzaki (KEK) KIAS School on MadGraph fur LHC Physics @ Korea Institute For Advanced Study Oct. 29, 2011

Transcript of MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in...

Page 1: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

MG/ME on GPU

Junichi Kanzaki (KEK)

KIAS School on MadGraph fur LHC Physics

@ Korea Institute For Advanced Study

Oct. 29, 2011

Page 2: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Contents

•Introduction

•GPU

•Development and test of HEGET

•MC integration

•Event generation

•PGS4

•Brief Summary & Prospects

Page 3: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Motivation

•Increase of the amount of LHC data-about 50pb-1 in 2010 -> 220 TB/day in 2010-5fb-1 in 2011-simulation data for physics analysis

•GRID: use CPU resources around the world-take weeks to reprocess accumulated real data

•Storage is also a serious problem.

Page 4: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

More Speed ...•Reduction of data processing time-> enormous impact on not only global data processing but also personal analysis environment

•CPU Clocks ≤ 4GHz -> multi-cores 8 (~>12)•CPU Farms

-local CPU farm -> large, expensive-GRID <- unifying local CPU farms

•Another way of parallelization with GPU•high order of parallelization ~500-1000•good cost performance

Page 5: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

GFLOPs

NVIDIA GPU single

NVIDIA GPU double

Intel CPU single

Intel CPU double

Page 6: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Overview

•Since the beginning of 2008, we have been working on the development of codes on GPU to improve performance of HEP softwares.

•We developed HEGET from HELAS for the computation of helicity amplitudes on GPU.

•Basic tests of HEGET functions were done with the QED (n-photon), QCD (n-jet) and more general SM processes with massive particles.

•VEGAS/BASES and SPRING

•PGS4

Page 7: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Publications•Our GPU application is the first example in the HEP software:

•QED - K. Hagiwara, J. Kanzaki, N. Okamura, D. Rainwater and T. Stelzer, “Fast calculation of HELAS amplitudes using graphics processing unit (GPU)", Eur. Phys. J. C66 (2010) 477.

•QCD - K. Hagiwara, J. Kanzaki, N. Okamura, D. Rainwater and T. Stelzer, “Calculation of HELAS amplitudes for QCD processes using graphics processing unit (GPU)", Eur. Phys. J. C70 (2010) 513.

•SM - finalizing the draft

•VEGAS/BASES - J. Kanzaki, “Monte Carlo integration on GPU”, Eur. Phys. J. C71 (2011) 1559.

•SPRING - in preparation

Page 8: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Computing Environment

CPU Core i7 2.67GHz

L2 Cache 8MB

Memory 6GB

Bus Speed 1.333GHz

OS Fedora 10 (64bit)

Host PC

Page 9: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

GPU

Page 10: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Graphic Card•GTX285 (2GB memory): ~500euro

GTX285

Page 11: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Application of GPU•GPU (Graphics Processing Unit): used for high performance output of graphic data (ex. 3D graphics) to the PC screen.

•Mainly manufactured by NVIDIA and AMD/ATI. NVIDIA provides the CUDA SDK which enables us to write the code for the GPU in C/C++.

•The CUDA SDK makes the application of GPU to general purpose computing very easy.

•Already various applications to the general computing exist in science/physics: astrophysics, fluid dynamics etc.

Page 12: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Our GPUs

GTX580 GTX285 GTX280 9800GTX

Multi Processor 16 30 ← 16

CUDA Cores 512 240 ← 128

Global Memory 1.5GB 2GB 1GB 500MB

Constant Memory 64KB 64KB ← 64KB

Shared Memory/block 48KB 16KB ← 16KB

Registers/block 32768 16384 ← 8192

Warp Size 32 32 ← 32

Clock Rate 1.54GHz 1.30GHz ← 1.67GHz

time

Page 13: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

•16 Streaming Multiprocessor (SM)

•One SM has 32 CUDA Cores -> 16x32 = 512 Cores in total

Architecture of GTX580 (GF100)Streaming Multi-processor (SM)

Streaming Multi-processor (SM)

Page 14: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Thread < Thread Block < Grid•Thread: a unit of executionAll threads execute the same kernel program.

•Thread block: a batch of threads Threads in a block can:- share data each other- synchronize their execution

•Grid: a set of thread blocksThey are executed at a single kernel call.

•Threads and blocks have their own IDs.

Grid

Block (1, 1)

Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0)

Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1)

Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2)

Block (2, 1) Block (1, 1) Block (0, 1)

Block (2, 0) Block (1, 0) Block (0, 0)

Grid Thread Block

Thread BlockThread

Page 15: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Memory Access•Each thread can access:registers - fast read/write per-threadlocal memory - slow read/write per-threadshared memory - fast read/write per-block

•CPU<->GPU data transferglobal memory read/write per-gridconstant memory read-only per-grid

Global memory

Grid 0

Block (2, 1) Block (1, 1) Block (0, 1)

Block (2, 0) Block (1, 0) Block (0, 0)

Grid 1

Block (1, 1)

Block (1, 0)

Block (1, 2)

Block (0, 1)

Block (0, 0)

Block (0, 2)

Thread BlockPer-block shared

memory

Thread

Per-thread local memory

Page 16: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Programming Model

•CUDA - NVIDIA’s SDK for GPU programming: C/C++ + some directives.

•From the C program executed on a CPU, the kernels are called with parameters:

Kernel<<<dimGrid, dimBlock>>> (ptrGlbalMemory, ...);

Serial code executes on the host while parallel code executes on the device.

Device

Grid 0

Block (2, 1) Block (1, 1) Block (0, 1)

Block (2, 0) Block (1, 0) Block (0, 0)

Host

C Program SequentialExecution

Serial code

Parallel kernel

Kernel0<<<>>>()

Serial code

Parallel kernel

Kernel1<<<>>>()

Host

Device

Grid 1

Block (1, 1)

Block (1, 0)

Block (1, 2)

Block (0, 1)

Block (0, 0)

Block (0, 2)

CPU

CPU

GPU

GPU

Page 17: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

•Add two vectors, A and B, on GPU C = A + B:

Very Simple Example

CUDA C Programming Guide Version 4.0 7

Chapter 2. Programming Model

This chapter introduces the main concepts behind the CUDA programming model by outlining how they are exposed in C. An extensive description of CUDA C is given in Chapter 3.

Full code for the vector addition example used in this chapter and the next can be found in the vectorAdd SDK code sample.

2.1 Kernels CUDA C extends C by allowing the programmer to define C functions, called kernels, that, when called, are executed N times in parallel by N different CUDA threads, as opposed to only once like regular C functions.

A kernel is defined using the __global__ declaration specifier and the number of CUDA threads that execute that kernel for a given kernel call is specified using a new <<<…>>> execution configuration syntax (see Appendix B.16). Each thread that executes the kernel is given a unique thread ID that is accessible within the kernel through the built-in threadIdx variable.

As an illustration, the following sample code adds two vectors A and B of size N and stores the result into vector C: // Kernel definition __global__ void VecAdd(float* A, float* B, float* C) { int i = threadIdx.x; C[i] = A[i] + B[i]; } int main() { ... // Kernel invocation with N threads VecAdd<<<1, N>>>(A, B, C); }

Here, each of the N threads that execute VecAdd() performs one pair-wise addition.

i: thread#

N: Size of vector (N≤1024)

KernelFunc<<< No_of_Blocks, threads_per_block >>>(ptrGlobalMem)

built-in variable

Kernel Function

Page 18: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Development and test of HEGET

Page 19: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

HEGET•HELAS (FORTRAN) -> HEGET (C) for GPU.

•A test program for the HEGET which calculate the total cross section of physics processes.

•Compare results with MG-ME and independent FORTRAN programs with BASES.

•Compare event process time between GPU and CPU.

•QED n-photon production processes: GTX280 / CUDA 2.1

•QCD n-gluon production processes: GTX280 / CUDA 2.1

•SM processes: GTX285 / CUDA 2.3

Page 20: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

QED & QCD processes

Page 21: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

QED Processes

•Construction of the GPU computation sytem and development of the HEGET functions and their validations.

•uu~ -> n-photons•|ηΥ|<2.5, pTΥ>20GeV, ΔRΥΥ>0.4•Two types of amplitude programs:-conversion of “matrix.f” -hand-written amp. with permutations of all photons.

Page 22: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Amplitude Division

•For NΥ≥6, the size of “matrix.f” amplitude is too large for the CUDA

•Divide the amplitude into smaller pieces -> execute them serially as different kernels.

# photons # diagrams = (# photons)!

2 2

3 6

4 24

5 120

6 720

7 5040

8 40320

Page 23: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Event Process Time (QED)

Preliminary

Number of Photons2 3 4 5 6 7 8

sec

Proc

ess

Tim

e / E

vent

[

-210

-110

1

10

210

310

410

CPU

GPU

Permutation

Permutation

MadGraph

MadGraph

MadGraph (divided)

Event Process Time on GTX280 photonsA uu

Page 24: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Ratio of Process Time (QED)

Number of Photons2 3 4 5 6 7 8

Rat

io o

f Pro

cess

Tim

e

0

20

40

60

80

100

120

140

160

180

Permutation

MadGraph

MadGraph (divided)

CPU / GPU(GTX280) photonsA uu

Page 25: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

pTγmax

[GeV]T

p0 100 200 300 400 500 600 700 8000

0.05

0.1

0.15

0.2

0.25

a 5A uu Max.

Tp

Heget

MadGraph /MadEvent

Bases

R60 1 2 3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

a 5A uu aa R6

Heget

MadGraph /MadEventBases

ΔRγγ

Comparison of distributions

•uux -> 5 photons

Page 26: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Effect of Unrolling Loops (GTX280)

Number of Photons3 4 5 6 7 8

Rat

io o

f Pro

cess

Tim

e

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Unroll One Perm

Unroll Two Perm

MadGraph

MadGraph (divided)

Effect of UnrollingUnrolled / No-unrolling

Page 27: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Double Precision Support (GTX280)

Number of Photons2 3 4 5

Rat

io o

f Pro

cess

Tim

e

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Permutation

MadGraph

Ratio of Process Time Double / Single Precision

Page 28: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Various GPUs

Number of Photons2 3 4 5

Rat

io o

f Pro

cess

Tim

e

0

1

2

3

4

5

6

7

8800M GTS (iMac)

9800GTX

Permutation AmplitudeRatio vs. GTX280

Page 29: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

QCD Processes

•uux -> n-gluons, gg -> n-gluons and uu -> uu+gluons

•|ηj|<2.5, pTj>20GeV, pTjj>20GeV

•Qren = Qfac = 20GeV

•Color matrix multiplication is decomposed: multiplications with the same factors are assembled to reduce number of multiplications.

•“gg -> 5g” program can be compiled but cannot be executed on GPU due to its size.

Page 30: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

QCD Processes

# final jets

gg → gluons→ gluons uu~ → gluonsuu~ → gluons uu → uu+gluons uu+gluons# final jets #diagram #color #diagram #color #diagram #color

2 6 6 3 2 2 2

3 45 24 18 6 10 8

4 510 120 159 24 76 40

5 7245 720 1890 120 786 240

Page 31: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Event Process Time (QCD)

Number of Jets in Final State2 3 4 5

sec

Proc

ess

Tim

e / E

vent

[

-210

-110

1

10

210

310

CPU

GPUgg

gg

uu

uu

uu

uu

Event Process Time on GTX280QCD Processes

Page 32: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Ratio of Process Time (QCD)

Number of Jets in Final State2 3 4 5

Rat

io o

f Pro

cess

Tim

e

0

20

40

60

80

100

120

140

160

180

gg

uuuu

CPU / GPU(GTX280)

Page 33: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

SM processes

Page 34: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

SM Processes•List of processes

-W+4jets:ud~->W++ng, ug->W+d+ng, uu->W+ud+ng, gg->W+du~+ng

-Z+4jets:uux->Z+ng, ug->Zg+ng, uu->Zuu+ng, gg->Zuu~

-WW+3jets:uu~->W+W-+ng, ug->W+W-u+ng, uu->W+W-uu, uu->W+W+dd, gg->W+W-uu~

-WZ+3jets:ud~->W+Z+ng, ug->W+Zd+ng, uu->W+Zud, gg->W+Zdu~

-ZZ+3jets:uu~->ZZ+ng, ug->ZZd+ng, uu->ZZuu, gg->WWuu~

Page 35: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

SM Processes (contn’d)•List of processes

-tt~+3jets:uu~->tt~+ng, ug->tt~u+ng, uu->tt~uu+ng, gg->tt~+ng

-HW+3jets:ud~->HW+ng, ug->HWd+ng, uu->HWud+ng, gg->HWdu~+ng

-HZ+3jets:uu~->HZ+ng, ug->HZu+ng, uu->HZuu+ng, gg->HZuu~+ng

-Htt~+2jets:uu~->Htt~+ng, ug->Htt~u+ng, uu->Htt~uu, gg->Hbtt~+ng

-H(WBF)+2jets:ud->Hud+ng, uu->Huu+ng, ug->Hudd~+ng, gg->Huu~+dd~

-HH+3jets and HHH+2jets:ud->HHud+ng, uu->HHuu+ng, ud->HHHud, uu->HHHuu

Page 36: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

SM Processes•Generation of random numbers on GPU.

•Decays of W, Z, t and H: W ->l(=e,µ) ν, Z->ll (=e, µ), t->W(->lν) b, H->τ+τ-

•Lepton: pTl>20GeV, |ηl|<2.5

•b-jets: pTb>20GeV, |ηb|<2.5

•Light quark jets: pTj>20GeV, |ηj|<5

•Separation of jets: pTjj>20GeV

•Qren = Qfac = MZ

•BW width factor = 20

Page 37: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Ratio of Process Time (SM) (GTX285)

Number of Jets in Final State0 1 2 3 4

Rat

io o

f Pro

cess

Tim

e

0

50

100

150

+ jets+ W� du d + jets+ W�u g u d + jets+ W�u u

+ jetsu d + W�g g

W + jets

Number of Jets in Final State0 1 2 3

Ratio

of P

roce

ss T

ime

0

50

100

150

+ jetst t � uu + jetst t �g g u + jetst t �u g u u + jetst t �u u

+ jetstt

Number of Jets in Final State0 1 2 3

Ratio

of P

roce

ss T

ime

0

50

100

150

+ jets- W+ W� uu u + jets- W+ W�u g u u + jets- W+ W�u u d d + jets+ W+ W�u u

+ jetsu u - W+ W�g g

WW + jets

Number of Jets in Final State0 1 2

Rat

io o

f Pro

cess

Tim

e

0

50

100

150

+ jetst H t → uu + jetst H t →g g u + jetst H t →u g u u + jetst H t →u u

+ jetstH t

W+jets

tt~+jets Htt~+jets

WW+jets

Page 38: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

New GTX580

Number of Jets in Final State0 1 2 3 4

Rat

io o

f Pro

cess

ing

Tim

e (C

PU /

GPU

)

0

50

100

150

200

250GTX580

GTX285) + jetsµν +µ → (+ W→ du

•Number of CUDA cores is doubled. Hence the performance of programs on GPU is also roughly doubled.

Page 39: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

New GPU (Double/Single)•Double precision support is improved ...Better support by TESLA: GPGPU specialized board.

Number of Photons2 3 4 5

Rat

io to

Pro

cess

Tim

e ( D

oubl

e / S

ingl

e )

0

1

2

3

4

GTX580 MG GTX280 MG

GTX580 Perm GTX280 Perm

Page 40: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

MC integration on GPU

Page 41: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Application of GPU to Practical Programs•Application of GPU to more general programs. -> acceleration of MC integration programs.

•MC integration: generate many independent points in multi-dimensional phase space and evaluate function values at each point -> can be easily parallelized.

•Developed GPU versions of VEGAS and BASES test processes:

  udx -> W+ (->µ+νµ) + n-gluons (n=0~4)

compare cross sections and process time.

Page 42: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Program Development•Convert FORTRAN programs into C.•Modify program structure for GPU parallelization -> GPU versions of VEGAS and BASES

•Computations of function values at each space point are parallelized on GPU.

•We compare results and performances of programs of three versions:

•FORTRAN (original)

•C (converted from FORTRAN)

•CUDA (GPU)

Page 43: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Parameters of MC integration•NCALL: no. of points generated at each iteration step.

•ITMX: max. no. of iterations For BASES, iterations are divided into two phases: “Grid Optimization Step (ITMX1)” and “Integration Step (ITMX2)”.

•ACC: required accuracy at iteration step. Program is terminated when this accuracy is reached. (For BASES they can be applied for each iteration phase: ACC1 and ACC2)

Page 44: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Parameters of MC integration•ACC is kept small to loop over all iterations. -> ACC = 10-3 %

•ITMX = ITMX1 + ITMX2 = 10

•NCALL is determined in order that the accuracy of total cross sections becomes 0.1%.

No. of gluons NCALL ITMX ITMX1 ITMX2

0 10^7 10 5 5

1 10^8 10 5 5

2 10^9 10 5 5

3 10^10 10 5 5

4 10^10 10 5 5

Page 45: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Ratio of Total Process Time

Number of Gluons in Final State0 1 2 3 4

Ratio

to T

otal

Pro

cess

Tim

e on

GPU

0

20

40

60

80

100

120

140

FORTRAN / GTX580C / GTX580FORTRAN / GTX285C / GTX285

Total Process Time Ratio of BASES + gluons+ W→ du

Page 46: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

GTX580 (Performance ratios)

•Improvement by new GPU itself ≈ 2.

Number of Gluons in Final State0 1 2 3 4

Rat

io to

Tot

al P

roce

ss T

ime

on G

PU

0

0.5

1

1.5

2

2.5

SM

BASES

SPRING

Total Process Time Ratio of BASES

+ gluons+ W→ du

Page 47: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Event generation on GPU

Page 48: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Event Generation by SPRING

•SPRING: accompanying software package of BASES-> generates unweighted events based on BASES output file.

•Given number of events are allocated to hyper-cells proportional to a value of integral in each cell.

•In each cell, “acceptance-rejection” is performed for each event with a set of random numbers. -> if failed, try another set.

Page 49: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

SPRING on GPU (gSPRING)

•One thread takes care of a generation of one event.-> generation at an inefficient cell determines the total performance.

•“Thread Recycling”: one “acceptance-rejection” trial at one kernel call.-> generated events are removed, and failed events are multiplied to fill all vacant threads.-> repeat until all events are successfully generated.

Page 50: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Event Generation by SPRING

•For the test of SPRING the same process as the BASES test is used:

  ud~ -> W+ (->mu+ vm) + n-gluons (n=0~4).

•Compare FORTRAN, C and GPU versions of SPRING program.

•Generate 106 events and compared the performance.

Page 51: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Generated distributions•ud~ -> W+ (->mu+ vm) + 3-gluons (106 events). •x1 (energy fraction of u):

0 0.05 0.1 0.15 0.2 0.25

410

Fortran

C

GPU

0 0.05 0.1 0.15 0.2 0.250.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2C / Fortran

GPU / Fortran

Page 52: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Generated distributions•pT (mu+):

0 10 20 30 40 50 60 70 80 90 1000

5000

10000

15000

20000

25000

30000Fortran

C

GPU

0 10 20 30 40 50 60 70 80 90 1000.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2C / Fortran

GPU / Fortran

Page 53: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Generated distributions•eta (mu+):

-5 -4 -3 -2 -1 0 1 2 3 4 50

10000

20000

30000

40000

50000

60000Fortran

C

GPU

-5 -4 -3 -2 -1 0 1 2 3 4 50.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2C / Fortran

GPU / Fortran

Page 54: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Generated distributions•pT (gluon):

0 10 20 30 40 50 60 70 80 90 1000

20

40

60

80

100

120

140

160

180

310×

Fortran

C

GPU

0 10 20 30 40 50 60 70 80 90 1000.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2C / Fortran

GPU / Fortran

Page 55: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Generated distributions•eta (gluon):

-5 -4 -3 -2 -1 0 1 2 3 4 50

20

40

60

80

100

120

310×

Fortran

C

GPU

-5 -4 -3 -2 -1 0 1 2 3 4 50.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2C / Fortran

GPU / Fortran

Page 56: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

SPRING performance

•Total execution time [sec]:

No. of gluons FORTRAN C GTX580 GTX285

0 9.72 5.80 0.346 0.411

1 43.2 26.7 0.768 0.994

2 4224.8 2966.7 26.53 42.58

3 *** 32292 267.0 297.9

Page 57: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Ratio of process time (GTX580)

Number of Gluons in Final State0 1 2 3

Ratio

to T

otal

Pro

cess

Tim

e on

GPU

0

50

100

150

Process Time Ratio + n-gluons+ W→ du

SPRING FORTRAN/GTX580 C/GTX580BASES FORTRAN/GTX580 C/GTX580

BASES

SPRING

Page 58: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

PGS on GPU

Page 59: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

PGS•PGS version4: Rewrite FORTRAN codes in C. Develop the GPU version based on the C program (single precision). -> one event/one thread: “Event Parallelization”

•Prepare particle events after parton showering and decay/fragmentations with Pythia as input (binary).

•Sample processes (LHC@7TeV):

-ud~ -> W-(->mu-vm~) + (0~4)-gluons

-pp -> tt~ -> W-(->mu-vm~) b~ W+(->jj) b

•Compare total performance including time for event I/O to/from external files (LHCO text files as output).

Page 60: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Process time for FORTRAN and C

•Process time / events with 10000 tt~ events [msec]

PGS Event I/O

FORTRAN 47.66 0.35 (0.7%)

C 40.33 0.14 (0.35%)

+W g+W 2g+W 3g+W 4g+W tt

Exec

utio

n Ti

me

/ Eve

nts

[mse

c]

0

10

20

30

40

50

60 FORTRANPGSEvent I/O

+W g+W 2g+W 3g+W 4g+W tt

Exec

utio

n Ti

me

/ Eve

nts

[mse

c]

0

10

20

30

40

50

60 CPGSEvent I/O

Page 61: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Process time for FORTRAN and C

•C programs run faster than FORTRAN ones (as usual) and event I/O by C is also faster than FORTRAN by a factor of 2 for the same binary data.

•Fraction of I/O parts is less than 1%. -> Total performance can be improved by a factor of 100 by GPU!

-> but ...

Page 62: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Process time for FORTRAN and C

•Access to calorimeter data is very slow ...

•PGS expands calorimeter data as a large array of cells with eta x phi = (320x200) (default). -> Almost all cells have zero energies ...

•Cell energies are checked late in the loops on eta and phi cell numbers. -> Modify to check energies first.

•Modify calorimeter data structure from a large array to a list of cell energies. <- intended to reduce local memory size for GPU version.

Page 63: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Improvement of CPU programs

•Total performance is greatly improved by simply checking cell energies first.

•Further improvement is possible by the change of calorimeter data structure.

+W g+W 2g+W 3g+W 4g+W tt0

10

20

30

40

50

60

OriginalCheck CAL Energy First

FORTRAN

+W g+W 2g+W 3g+W 4g+W tt0

5

10

15

20

25

30

35

40

45

50

Original

Check CAL Energy First

CAL Data Structure

C

Page 64: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

PGS I/O

Original 47.66 0.35 (0.7%)

Energy Check 4.55 0.38 (7.7%)

•Process time / events with 10000 tt~ events [msec]

FORTRAN

PGS I/O

Original 40.33 0.14 (0.35%)

Energy Check 1.99 0.14 (6.6%)

Data Structure 1.00 0.13 (11.4%)

C

Improvement of CPU programs

Page 65: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Performance of C program

•Expected improvement factor by GPU becomes less than 10.

+W g+W 2g+W 3g+W 4g+W tt

Exec

utio

n Ti

me

/ Eve

nts

[mse

c]

0

0.2

0.4

0.6

0.8

1

1.2 CPGSEvent I/O

Page 66: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Issues for the GPU version

•Limit of size of local memories: 512KB/thread.

•Possible solutions:

-Put large data on the global memory and access them each time.

-Change the data structure to minimize its size.-> also improves performance of CPU programs

-> Developed the GPU version of PGS with the modified data structure for the calorimeter.

Page 67: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Compare distributions (mu)

[GeV]0 50 100 150 200 250 3000

2000

4000

6000

8000

10000

12000

14000

16000

18000 tt GPU

µ, TP

-3 -2 -1 0 1 2 30

1000

2000

3000

4000

5000

6000

7000 tt GPUµ, η

[GeV]0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

1600

1800 tt CPU

µ, TP

-3 -2 -1 0 1 2 30

100

200

300

400

500

600

700 tt CPUµ, η

GPU

CPU

Page 68: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

GPU

[GeV]0 50 100 150 200 250 3000

1000

2000

3000

4000

5000

6000

tt CPU

, jetTP

-5 -4 -3 -2 -1 0 1 2 3 4 50

500

1000

1500

2000

2500 tt CPU, jetη

CPU[GeV]

0 50 100 150 200 250 3000

10000

20000

30000

40000

50000

60000

tt GPU

, jetTP

-5 -4 -3 -2 -1 0 1 2 3 4 50

5000

10000

15000

20000

25000 tt GPU, jetη

Compare distributions (jet)

Page 69: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Improvement by GPU

+W g+W 2g+W 3g+W 4g+W tt

Rat

io o

f Exe

cutio

n Ti

me

0

1

2

3

4

5

6

7C (fast) / GPU

•Obtained about a factor of 7 for processes with complex final states.

Page 70: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

•Due to the overhead for the data transfers between host and GPU, this improvement factor is consistent with the expectation.

•Process time / events with tt~ events [msec]

PGS part of GPU is dominated by the data transfer between CPU and GPU.

PGS I/O

FORTRAN (original) 47.66 0.35 (0.7%)

C (fast code) 1.00 0.13 (11.4%)

GPU 0.017 0.146 (90%)

Improvement by GPU

Page 71: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

+W g+W 2g+W 3g+W 4g+W tt

Ratio

of E

xecu

tion

Tim

e

0

50

100

150

200

250

300

350

400FORTRAN (slow) / GPU

•Improvement is very large compared with the original FORTRAN program ....

Improvement by GPU

Page 72: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

•Process time ratio only for the PGS part is reasonable.

PGS performance ratio

+W g+W 2g+W 3g+W 4g+W tt

Rat

io o

f Exe

cutio

n Ti

me

0

10

20

30

40

50

60

70

C (fast) / GPU (PGS part)

Page 73: MG/ME on GPU - KIASworkshop.kias.re.kr/MGLP/?download=GPU.pdf · Presented by J. Kanzaki at KIAS in Oct. 29, 2011 Motivation •Increase of the amount of LHC data-about 50pb-1 in

Presented by J. Kanzaki at KIAS in Oct. 29, 2011

Brief Summary & Prospects

•For the integration of GPU programs to the MG/ME system ...

-component programs become almost ready-> the next step: develop efficient system to handle multi-subprocess case.-----------------------

•Slide will be uploaded soon.

•I will summarize how to use GPU installed to MacBook (Pro).