CS 179: Lecture 2 Lab Review 1

20
CS 179: Lecture 2 Lab Review 1

description

CS 179: Lecture 2 Lab Review 1. The Problem. Add two arrays A[] + B[] -> C[]. GPU Computing: Step by Step. Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Copy inputs from host to GPU Allocate memory for outputs on the host - PowerPoint PPT Presentation

Transcript of CS 179: Lecture 2 Lab Review 1

Page 1: CS 179:  Lecture 2 Lab Review  1

CS 179: Lecture 2Lab Review 1

Page 2: CS 179:  Lecture 2 Lab Review  1

The Problem Add two arrays

A[] + B[] -> C[]

Page 3: CS 179:  Lecture 2 Lab Review  1

GPU Computing: Step by Step Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Copy inputs from host to GPU Allocate memory for outputs on the host Allocate memory for outputs on the GPU Start GPU kernel Copy output from GPU to host

(Copying can be asynchronous)

Page 4: CS 179:  Lecture 2 Lab Review  1

The Kernel Determine a thread index from block ID and thread ID within

a block:

Page 5: CS 179:  Lecture 2 Lab Review  1

Calling the Kernel

Page 6: CS 179:  Lecture 2 Lab Review  1

CUDA implementation (2)

Page 7: CS 179:  Lecture 2 Lab Review  1

Fixing the Kernel For large arrays, our kernel doesn’t work!

Bounds-checking – be on the lookout! Also, need a way for kernel to handle a few more elements…

Page 8: CS 179:  Lecture 2 Lab Review  1

Fixing the Kernel – Part 1

Page 9: CS 179:  Lecture 2 Lab Review  1

Fixing the Kernel – Part 2

Page 10: CS 179:  Lecture 2 Lab Review  1

Fixing our Call

Page 11: CS 179:  Lecture 2 Lab Review  1

Lab 1! Sum of polynomials – Fun, parallelizable example!

Suppose we have a polynomial P(r) with coefficients c0, …, cn-1, given by:

We want, for r0, …, rN-1, the sum:

Output condenses to one number!

Page 12: CS 179:  Lecture 2 Lab Review  1

Calculating P(r) once Pseudocode (one possible method):

Given r, coefficients[]result <- 0.0power <- 1.0

for all coefficient indecies i from 0 to n-1:result += (coefficients[i] * power)power *= r

Page 13: CS 179:  Lecture 2 Lab Review  1

Accumulation atomicAdd() function

Important for safe operations!

Page 14: CS 179:  Lecture 2 Lab Review  1

Accumulation

Page 15: CS 179:  Lecture 2 Lab Review  1

Shared Memory Faster than global memory Per-block One block

Page 16: CS 179:  Lecture 2 Lab Review  1

Linear Accumulation atomicAdd() has a choke point! What if we reduced our results in parallel?

Page 17: CS 179:  Lecture 2 Lab Review  1

Linear Accumulation

Page 18: CS 179:  Lecture 2 Lab Review  1

Linear Accumulation (2)

Page 19: CS 179:  Lecture 2 Lab Review  1

Can we do better?

Page 20: CS 179:  Lecture 2 Lab Review  1

Last notes minuteman.cms.caltech.edu – the easiest option

CMS accounts! Office hours

Kevin: Monday, 8-10 PM Connor: Tuesday, 8-10 PM