GPU-Accelerated Genetic Algorithms

27
GPU-Accelerated Genetic Algorithms Rajvi Shah + , P J Narayanan + , Kishore Kothapalliˆ IIIT Hyderabad Hyderabad, India + : Center for Visual Information Technology ˆ : Center for Security, Theory and Algorithmic Research

description

GPU-Accelerated Genetic Algorithms. Rajvi Shah + , P J Narayanan + , Kishore Kothapalli ˆ IIIT Hyderabad Hyderabad, India. + : Center for Visual Information Technology ˆ : Center for Security, Theory and Algorithmic Research. GAs – an introduction. Genetic Algorithms - PowerPoint PPT Presentation

Transcript of GPU-Accelerated Genetic Algorithms

Page 1: GPU-Accelerated Genetic Algorithms

GPU-Accelerated Genetic Algorithms

Rajvi Shah+, P J Narayanan+, Kishore KothapalliˆIIIT Hyderabad

Hyderabad, India

+ : Center for Visual Information Technology ˆ : Center for Security, Theory and Algorithmic Research

Page 2: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

GAs – an introduction

Genetic Algorithms A class of evolutionary algorithms Efficiently solves optimization tasks Potential Applications in many fields

Challenges Large execution time

Page 3: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Typical flow of a GA

A representation for chromosome

Create Initial Population

Select Parents

Create New Population

GA Parameters

Terminate?

Evaluate Fitness

Crossover Operator

Mutation Operator

Termination Criteria

User Specifies …

A method for fitness evaluation

N o

ExitYes

Page 4: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Accelerating Genetic Algorithms

High degree of parallelism Fitness evaluation Crossover Mutation

Most obvious : chromosome level parallelism Same Operations on each chromosome Use a thread per chromosome

Page 5: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Gene-level Parallelism

Thread-per-chromosome model Good enough for small to moderate sized multi-

core Doesn’t map well to a massively multithreaded

GPUs

Solution : identify and exploit gene-level

parallelism

Page 6: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

CUDA

Page 7: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Our Approach

A column of threads read a chromosome gene-by-gene and cooperate to perform operations

Results in coalesced read and faster processing

Population Matrix in Memory

Thread Blocks in a grid

Page 8: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Program Execution Flow

Construct Initial Population

On CPU

GPU Global Memory

Random NumbersOld PopulationNew PopulationFitness Scores

Statistics

Evaluation KernelStatistics Update

KernelSelection Kernel

Crossover Kernel

Mutation Kernel

Parse GA Parameters Generate Random Numbers

On GPU

Page 9: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Program Execution Flow

Construct Initial Population

On CPU

GPU Global Memory

Random NumbersOld PopulationNew PopulationFitness Scores

Statistics

Statistics Update Kernel

Selection Kernel

Crossover Kernel

Mutation Kernel

Parse GA Parameters Generate Random Numbers

On GPU

Population

Scores

Evaluation KernelEvaluation Kernel

Page 10: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Fitness EvaluationPartially parallel method

Partially-parallel Method

User Specifies a serial code fragment for fitness evaluation.

Threads are arranged in a 1D grid.

Each thread executes user’s code on one chromosome.

Providing chromosome level parallelism.

Benefit : Abstraction

Fully parallel method

CUDA familiar user can effectively use 2D thread layout

Use gene level Parallelism for fitness evaluation

Benefit : Efficiency

Page 11: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Example – 0/1 Knapsack Task :

Given weights , costs & knapsack capacity

Aim : maximize the cost.

Representation 1D binary string 0/1: Absence/Presence of an item, W and C are total weight and Cost

of given representation

Best Solution : One with max C given W < Wmax

Fully Parallel Method

Use a group of threads to compute total cost and weight in logarithmic time

Page 12: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Program Execution Flow

Construct Initial Population

On CPU

GPU Global Memory

Random NumbersOld PopulationNew PopulationFitness Scores

Statistics

Statistics Update Kernel

Selection Kernel

Crossover Kernel

Mutation Kernel

Parse GA Parameters Generate Random Numbers

On GPU

Scores

Statistics

Evaluation KernelStatistics Update

Kernel

Page 13: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Statistics

Selection and Termination most often use Population Statistics

We use standard parallel reduce algorithm to calculate Max, Min, Average Scores

We use highly optimized public library CUDPP To sort and rank chromosomes

Page 14: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Program Execution Flow

Construct Initial Population

On CPU

GPU Global Memory

Random NumbersOld PopulationNew PopulationFitness Scores

Statistics

Statistics Update Kernel

Selection Kernel

Crossover Kernel

Mutation Kernel

Parse GA Parameters Generate Random Numbers

On GPU

Statistics

Parents

Evaluation Kernel

Selection Kernel

Page 15: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Selection Selection Kernel

Uses N/2 threads Each thread selects two parents for producing

offspring

Uniform Selection : Selects parents in a uniform random manner

Roulette Wheel Selection: Fitness based approach, more the fitness, better

the chance of selection

Page 16: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Selection

Roulette Wheel Sort fitness scores

Compute a roulette wheel array by doing a prefix-sum scan of scores and normalizing it.

Generate a random number in 0-1.

Perform binary search in roulette wheel array for the nearest smaller number to the randomly selected number.

Return the index of the result in array

Image Courtesy : xyz

Page 17: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Program Execution Flow

Construct Initial Population

On CPU

GPU Global Memory

Random NumbersOld PopulationNew PopulationFitness Scores

Statistics

Statistics Update Kernel

Selection Kernel

Crossover Kernel

Mutation Kernel

Parse GA Parameters Generate Random Numbers

On GPU

Old Population

New Population

Evaluation Kernel

Crossover Kernel

Page 18: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

CrossoverGPU Global Memory

Parent1 02

08

12

05

15

Parent2 04

13

07

19

14Crossove

r03

02

02

04

01

Population

Thread idy Thread idy

08

13

02

Thread idy

12

07

02

Thread idy

05

19

02

Thre

ad id

x 1-

L

Thre

ad id

x 1-

L

Thre

ad id

x 1-

L

Thre

ad id

x 1-

L1 2 3 4 5 6 7 8

Page 19: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Program Execution Flow

Construct Initial Population

On CPU

GPU Global Memory

Random NumbersOld PopulationNew PopulationFitness Scores

Statistics

Statistics Update Kernel

Selection Kernel

Crossover Kernel

Mutation Kernel

Parse GA Parameters Generate Random Numbers

On GPU

New Population

New Population

Evaluation Kernel

Mutation Kernel

Page 20: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

Thread 1,4Coin State Gene

X

Flip CoinCoin State Gene

T

Mutation

Flip Mutator Each thread handles

one gene and mutates it with probability of mutation

Thre

ad Id

x

Thread Id y

Population

Page 21: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Thre

ad Id

x

Thread Id y

Population

Mutation

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

x

X

xx

xx

F

F

FF

FF

T

F

FF

FF

F

F

FF

FF

F

F

FF

FF

F

F

FF

TF

F

F

FF

FF

F

F

TF

FF

F

F

FF

FF

F

F

FF

FF

F

F

FF

FF

F

F

FF

FT

F

F

FF

FF

F

F

FF

FF

F

F

FF

FF

F

F

FF

FF

F

F

FF

FF

F

F

FF

TF

F

F

FF

FF

F

F

TF

FF

F

F

FF

TF

Thread 1,4Coin State Gene

X

Flip CoinCoin State Gene

T

Flip Mutator Each thread handles

one gene and mutates it with probability of mutation

Page 22: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Program Execution Flow

Construct Initial Population

On CPU

GPU Global Memory

Random NumbersOld PopulationNew PopulationFitness Scores

Statistics

Statistics Update Kernel

Selection Kernel

Crossover Kernel

Mutation Kernel

Parse GA Parameters Generate Random Numbers

On GPU

Random No.s

Evaluation Kernel

Generate Random Numbers

Page 23: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Random Number Generation

Extensive use of random numbers

No primitive for on the fly single random number generation

Solution: Generate a pool of random numbers and copy it on GPU

We use CUDPP routine to generate a large pool of random numbers on GPU (faster)

If better quality random numbers are needed, this can be replaced by a CPU based routine

Page 24: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Results Test Device :

A quarter of Nvidia Tesla S1030 GPU

Test Problem : Solve a 0/1 knapsack problem

Test Parameters: Representation : A 1D Binary String Crossover : One-point crossover Mutation : Flip Mutation Selection : Uniform and Roulette Wheel

Page 25: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Results

Ave. Run-time for 100 iterations (Uniform Selection)

Ave. Run-time for 100 iterations (Roulette Wheel Selection)

Growth in run-time for increase in NxLN: Population Size , L: Chromosome

Length

Page 26: GPU-Accelerated Genetic Algorithms

International Institute of Information Technology, Hyderabad, India

Scope Our approach is modeled after GAlib and

maintains structures for GA, Genome and Statistics

It is built with enough abstraction from user program so that user does not need to know CUDA architecture or programming.

This can be extended to build a GPU-Accelerated GA library

Page 27: GPU-Accelerated Genetic Algorithms

Thank [email protected]

[email protected]@iiit.ac.in

International Institute of Information Technology, Hyderabad, India