Database Operations on GPU Changchang Wu 4/18/2007.

84
Database Operations on GPU Changchang Wu 4/18/2007
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Database Operations on GPU Changchang Wu 4/18/2007.

Page 1: Database Operations on GPU Changchang Wu 4/18/2007.

Database Operations on GPU

Changchang Wu

4/18/2007

Page 2: Database Operations on GPU Changchang Wu 4/18/2007.

Outline

• Database Operations on GPU

• Point List Generation on GPU

• Nearest Neighbor Searching on GPU

Page 3: Database Operations on GPU Changchang Wu 4/18/2007.

Database Operations on GPU

Page 4: Database Operations on GPU Changchang Wu 4/18/2007.

Design Issues

• Low bandwidth between GPU and CPU• Avoid frame buffer readbacks

• No arbitrary writes• Avoid data rearrangements

• Programmable pipeline has poor branching• Evaluate branches using fixed function tests

Page 5: Database Operations on GPU Changchang Wu 4/18/2007.

Design Overview

• Use depth test functionality of GPUs for performing comparisons

• Implements all possible comparisons <, <=, >=, >, ==, !=, ALWAYS, NEVER

• Use stencil test for data validation and storing results of comparison operations

• Use occlusion query to count number of elements that satisfy some condition

Page 6: Database Operations on GPU Changchang Wu 4/18/2007.

Basic Operations

Basic SQL query Select A

From T

Where C

A= attributes or aggregations (SUM, COUNT, MAX etc)

T=relational table

C= Boolean Combination of Predicates (using operators AND, OR, NOT)

Page 7: Database Operations on GPU Changchang Wu 4/18/2007.

Basic Operations

• Predicates – ai op constant or ai op aj

• Op is one of <,>,<=,>=,!=, =, TRUE, FALSE

• Boolean combinations – Conjunctive Normal Form (CNF) expression evaluation

• Aggregations – COUNT, SUM, MAX, MEDIAN, AVG

Page 8: Database Operations on GPU Changchang Wu 4/18/2007.

Predicate Evaluation

• ai op constant (d)

• Copy the attribute values ai into depth buffer

• Define the comparison operation using depth test• Draw a screen filling quad at depth d

glDepthFunc(…)

glStencilOp(fail, zfail, zpass );

Page 9: Database Operations on GPU Changchang Wu 4/18/2007.

Predicate Evaluation

• Comparing two attributes: • ai op aj is treated as (ai – aj) op 0

• Semi-linear queries

• Easy to compute with fragment shader

Page 10: Database Operations on GPU Changchang Wu 4/18/2007.

Boolean Combinations

• Expression provided as a CNF

• CNF is of form (A1 AND A2 AND … AND Ak)

where Ai = (Bi1 OR Bi

2 OR … OR Bimi )

• CNF does not have NOT operator• If CNF has a NOT operator, invert comparison operation to

eliminate NOT

Eg. NOT (ai < d) => (ai >= d)

• For example, compute ai within [low, high]

• Evaluated as ( ai >= low ) AND ( ai <= high )

Page 11: Database Operations on GPU Changchang Wu 4/18/2007.

Algorithm

Page 12: Database Operations on GPU Changchang Wu 4/18/2007.

Range Query

• Compute ai within [low, high]

• Evaluated as ( ai >= low ) AND ( ai <= high )

Page 13: Database Operations on GPU Changchang Wu 4/18/2007.

Aggregations

• COUNT, MAX, MIN, SUM, AVG

• No data rearrangements

Page 14: Database Operations on GPU Changchang Wu 4/18/2007.

COUNT

• Use occlusion queries to get pixel pass count

• Syntax:• Begin occlusion query• Perform database operation• End occlusion query• Get count of number of attributes that passed database operation

• Involves no additional overhead!

Page 15: Database Operations on GPU Changchang Wu 4/18/2007.

MAX, MIN, MEDIAN

• We compute Kth-largest number

• Traditional algorithms require data rearrangements

• We perform no data rearrangements, no frame buffer readbacks

Page 16: Database Operations on GPU Changchang Wu 4/18/2007.

K-th Largest Number

• By comparing and counting, determinate every bit in order of MSB to LSB

Page 17: Database Operations on GPU Changchang Wu 4/18/2007.

Example: Parallel Max

• S={10,24,37,99,192,200,200,232}• Step 1: Draw Quad at 128(10000000)

• S = {10,24,37,99,192,200,200,232}

• Step 2: Draw Quad at 192(11000000)• S = {10,24,37,192,200,200,232}

• Step 3: Draw Quad at 224(11100000)• S = {10,24,37,192,200,200,232}

• Step 4: Draw Quad at 240(11110000)• – No values pass• Step 5: Draw Quad at 232(11101000)

• S = {10,24,37,192,200,200,232}

• Step 6,7,8: Draw Quads at 236,234,233 – No values pass, Max is 232

Page 18: Database Operations on GPU Changchang Wu 4/18/2007.

Accumulator, Mean• Accumulator - Use sorting algorithm and add

all the values• Mean – Use accumulator and divide by n• Interval range arithmetic• Alternative algorithm

• Use fragment programs – requires very few renderings

• Use mipmaps [Harris et al. 02], fragment programs [Coombe et al. 03]

Page 19: Database Operations on GPU Changchang Wu 4/18/2007.

Accumulator

• Data representation is of formak 2k + ak-1 2k-1 + … + a0

Sum = sum(ak) 2k+ sum(ak-1) 2k-1+…+sum(a0)

Current GPUs support no bit-masking operations

Page 20: Database Operations on GPU Changchang Wu 4/18/2007.

The Algorithm

>=0.5 means i-th bit is 1

Page 21: Database Operations on GPU Changchang Wu 4/18/2007.

Implementation

• Algorithm• CPU – Intel compiler 7.1 with hyper-threading,

multi-threading, SIMD optimizations• GPU – NVIDIA Cg Compiler

• Hardware• Dell Precision Workstation with Dual 2.8GHz Xeon

Processor• NVIDIA GeForce FX 5900 Ultra GPU• 2GB RAM

Page 22: Database Operations on GPU Changchang Wu 4/18/2007.

Benchmarks

• TCP/IP database with 1 million records and four attributes

• Census database with 360K records

Page 23: Database Operations on GPU Changchang Wu 4/18/2007.

Copy Time

Page 24: Database Operations on GPU Changchang Wu 4/18/2007.

Predicate Evaluation

Page 25: Database Operations on GPU Changchang Wu 4/18/2007.

Range Query

Page 26: Database Operations on GPU Changchang Wu 4/18/2007.

Multi-Attribute Query

Page 27: Database Operations on GPU Changchang Wu 4/18/2007.

Semi-linear Query

Page 28: Database Operations on GPU Changchang Wu 4/18/2007.

Kth-Largest

Page 29: Database Operations on GPU Changchang Wu 4/18/2007.

Kth-Largest

Page 30: Database Operations on GPU Changchang Wu 4/18/2007.

Kth-Largest conditional

Page 31: Database Operations on GPU Changchang Wu 4/18/2007.

Accumulator

Page 32: Database Operations on GPU Changchang Wu 4/18/2007.

Analysis: Issues

• Precision

• Copy time

• Integer arithmetic

• Depth compare masking

• Memory management

• No Branching

• No random writes

Page 33: Database Operations on GPU Changchang Wu 4/18/2007.

Analysis: Performance

• Relative Performance Gain• High Performance – Predicate evaluation, multi-attribute queries, semi-linear queries, count

• Medium Performance – Kth-largest number• Low Performance - Accumulator

Page 34: Database Operations on GPU Changchang Wu 4/18/2007.

High Performance

• Parallel pixel processing engines

• Pipelining

• Early Z-cull

• Eliminate branch mispredictions

Page 35: Database Operations on GPU Changchang Wu 4/18/2007.

Medium Performance

• Parallelism• FX 5900 has clock speed 450MHz, 8 pixel

processing engines• Rendering single 1000x1000 quad takes

0.278ms• Rendering 19 such quads take 5.28ms.

Observed time is 6.6ms• 80% efficiency in parallelism!!

Page 36: Database Operations on GPU Changchang Wu 4/18/2007.

Low Performance

• No gain over SIMD based CPU implementation

• Two main reasons:• Lack of integer-arithmetic• Clock rate

Page 37: Database Operations on GPU Changchang Wu 4/18/2007.

Advantages

• Algorithms progress at GPU growth rate• Offload CPU work• Fast due to massive parallelism on GPUs

• Algorithms could be generalized to any geometric shape

• Eg. Max value within a triangular region• Commodity hardware!

Page 38: Database Operations on GPU Changchang Wu 4/18/2007.

GPU Point List Generation

• Data compaction

Page 39: Database Operations on GPU Changchang Wu 4/18/2007.

Overall task

Page 40: Database Operations on GPU Changchang Wu 4/18/2007.

3D to 2D mapping

Page 41: Database Operations on GPU Changchang Wu 4/18/2007.

Current Problem

Page 42: Database Operations on GPU Changchang Wu 4/18/2007.

The solution

Page 43: Database Operations on GPU Changchang Wu 4/18/2007.

Overview, Data Compaction

Page 44: Database Operations on GPU Changchang Wu 4/18/2007.

Algorithm: Discriminator

Page 45: Database Operations on GPU Changchang Wu 4/18/2007.

Algorithm: Histogram Builder

Page 46: Database Operations on GPU Changchang Wu 4/18/2007.

Histogram Output

Page 47: Database Operations on GPU Changchang Wu 4/18/2007.

Algorithm: PointList Builder

Page 48: Database Operations on GPU Changchang Wu 4/18/2007.

PointList Output

Page 49: Database Operations on GPU Changchang Wu 4/18/2007.

Timing

Reduces a highly sparse matrix with Nelements to a list of its M active entries

in O(N) + M (log N) steps,

Page 50: Database Operations on GPU Changchang Wu 4/18/2007.

Applications

• Image Analysis• Feature Detection

• Volume Analysis

• Sparse Matrix Generation

Page 51: Database Operations on GPU Changchang Wu 4/18/2007.

Searching

• 1D Binary Search

• Nearest Neighbor Search for High dimension space

• K-NN Search

Page 52: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Find a specific element in an ordered list• Implement just like CPU algorithm

• Assuming hardware supports long enough shaders• Finds the first element of a given value v

• If v does not exist, find next smallest element > v

• Search algorithm is sequential, but many searches can be executed in parallel

• Number of pixels drawn determines number of searches executed in parallel

• 1 pixel == 1 search

Page 53: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0

v0v0 v0v0 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize Search starts at center of sorted array

v2 >= v0 so search left half of sub-array

v2v2

Page 54: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0

v0v0 v0v0 v2v2 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22Step 1

v0 >= v0 so search left half of sub-array

Page 55: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0

v0v0 v2v2 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22

11

Step 1

Step 2

v0 >= v0 so search left half of sub-array

v0v0

Page 56: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0

v0v0 v2v2 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22

11

00

Step 1

Step 2

Step 3

At this point, we either have found v0 or are 1 element too far left

One last step to resolve

v0v0

Page 57: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0

v0v0 v2v2 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22

11

00

00

Step 1

Step 2

Step 3

Step 4

Done!

v0v0

Page 58: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0 and v2

v0v0 v0v0 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize 44 Search starts at center of sorted array

Both searches proceed to the left half of the array

v2v2

Page 59: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0 and v2

v0v0 v0v0 v2v2 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22Step 1

44

22

The search for v0 continues as before

The search for v2 overshot, so go back to the right

Page 60: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0 and v2

v0v0 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22

11

Step 1

Step 2

44

22

33

v0v0 v2v2

We’ve found the proper v2, but are still looking for v0

Both searches continue

Page 61: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0 and v2

v0v0 v2v2 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22

11

00

Step 1

Step 2

Step 3

44

22

33

22

v0v0

Now, we’ve found the proper v0, but overshot v2

The cleanup step takes care of this

Page 62: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search

• Search for v0 and v2

v0v0 v2v2 v2v2 v5v5v0v0 v5v5Sorted List00 11 33 44 55 6622 77

44Initialize

22

11

00

00

Step 1

Step 2

Step 3

Step 4

44

22

33

22

33

v0v0 v2v2

Done! Both v0 and v2 are located properly

Page 63: Database Operations on GPU Changchang Wu 4/18/2007.

Binary Search Summary

• Single rendering pass• Each pixel drawn performs independent search

• O(log n) steps

Page 64: Database Operations on GPU Changchang Wu 4/18/2007.

Nearest Neighbor Search

• Very fundamental step in similarity search of data mining, retrieval…

• Curse of dimensionality,• When dimensionality is very high, structures like k-d tree does not help

• Use GPU to improve linear scan

Page 65: Database Operations on GPU Changchang Wu 4/18/2007.

Distances

• N-norm distance

• Cosine distance acos(dot(x,y))

Page 66: Database Operations on GPU Changchang Wu 4/18/2007.

Data Representation

• Use separate textures to store different dimensions.

Page 67: Database Operations on GPU Changchang Wu 4/18/2007.

Distance Computation

• Accumulating distance component of different dimensions

Page 68: Database Operations on GPU Changchang Wu 4/18/2007.

Reduction in RGBA

Page 69: Database Operations on GPU Changchang Wu 4/18/2007.

Reduction to find NN

Page 70: Database Operations on GPU Changchang Wu 4/18/2007.

Results

Page 71: Database Operations on GPU Changchang Wu 4/18/2007.

Results

Page 72: Database Operations on GPU Changchang Wu 4/18/2007.

K-Nearest Neighbor Search

• Given a sample point p, find the k points nearest p within a data set

• On the CPU, this is easily done with a heap or priority queue

• Can add or reject neighbors as search progresses• Don’t know how to build one efficiently on GPU

• kNN-grid• Can only add neighbors…

Page 73: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 74: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• Candidate neighbors must be within max search radius

• Visit voxels in order of distance to sample point

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 75: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• If current number of neighbors found is less than the number requested, grow search radius

1

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 76: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

2

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

• If current number of neighbors found is less than the number requested, grow search radius

Page 77: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• Don’t add neighbors outside maximum search radius

• Don’t grow search radius when neighbor is outside maximum radius

2

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 78: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• Add neighbors within search radius

3

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 79: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• Add neighbors within search radius

4

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 80: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• Don’t expand search radius if enough neighbors already found

4

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 81: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• Add neighbors within search radius

5

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 82: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Algorithm

• Visit all other voxels accessible within determined search radius

• Add neighbors within search radius6

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 83: Database Operations on GPU Changchang Wu 4/18/2007.

kNN-grid Summary

• Finds all neighbors within a sphere centered about sample point

• May locate more than requested k-nearest neighbors

6

sample point

neighbors foundcandidate neighbor

Want 4 neighbors

Page 84: Database Operations on GPU Changchang Wu 4/18/2007.

References• Naga Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin and

Dinesh Manocha, Fast Computation of Database Operations using Graphics Processors http://www.gpgpu.org/s2004/slides/govindaraju.DatabaseOperations.ppt

• Benjamin Bustos, Oliver Deussen, Stefan Hiller, and Daniel Keim, A Graphic Hardware Accelerated Algorithm for Nearest Neighbor Search

• Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel, GPU Point List Generation through Histogram Pyramids

http://www.mpi-inf.mpg.de/~gziegler/gpu_pointlist/• Tim Purcell, Sorting and Searching

http://www.gpgpu.org/s2005/slides/purcell.SortingAndSearching.ppt