Evolving Universal Hash Functions Using Genetic Algorithms
Ramprasad Joshi, Mustafa Safdari
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
GOA CAMPUS
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Outline Introduction Implementation of Genetic Algorithms Simulation and Result Conclusion and future work
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Introduction
Universal Hash Functions Selecting h randomly
Universal Hash Functions Mapping integers in the range [0,M-1] to [0,N-
1] A Set H of hash functions is Universal if for
any 2 keys j and k and a randomly chose hash function h,
Expected no. of collisions for any key is n/N
1Pr h j h k
N
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Selecting h randomly One such type of Hash function:
p is a prime number, a, b are any two random integers, How do we select a, b, p? Minimize collisions as much as possible
, mod moda bh k ak b p N
2M p M 0 , 0a p b p
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Implementation of GA
Chromosome, Fitness Function, Crossover, Mutation
p_values, p_Array
Elements of the GA Chromosome:
Fitness function:
Crossover types: single point, 2 point, midway and random
Mutation: single point, multi point Roulette Wheel Selection
010010000111010111(32 )a bits
010010000111010111(32 )b bits
(64 )Chromosome bits
1filled
collisions
nFitness
n
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
p_values, p_Array p is any prime number such that M≤p<2M.
An array p_values called keeps track of the allowable values of p so that it can be used in the above steps. p_values can be constructed and populated it using any sieve algorithm (from Primality testing) to find out prime numbers within a range. The method used in our implementation of the algorithm uses Sieve of Eratosthenes.
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
p_Array For every chromosome of (a, b) there is an
associated value for p such that To store this information in the chromosome, we
create a separate array called p_Array which stores for each chromosome, the index of the prime number present in p_values. For example, if a chromosome in the population has a=9, b=7, p=4, it means that the value of p assigned for this chromosome is the one found in p_values at index 4.
Index values of p don’t undergo crossover/mutation. Only a, b do. But after each such operation, a suitable p is found for the new resultant a, b pair if the one associated with the parent chromosome doesn’t satisfy (1).
0 , 0 ---(1)a p b p
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Simulations and Results
Simulation settings Results
Simulation Settings No. of generations = 30 Size of populations = 50 pc = 0.8, pm = 0.01 Input set of keys N
Uniformly Randomly Generated in (0, 50000) Different sets of size 10, 100, 1000, 10000 Taking N as prime gives better results
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Results
TABLE IRESULTS OF RUNNING THE ALGORITHM FOR RANDOM INPUT DISTRIBUTIONS
Sr. No.
Range Of Input Crossover Type *
Mutation Type *
No. of keys n
No. of buckets N
No. of initial collisions
ncollisions nfilled p a b
1. 0-10 1 2 10 10 0 0 10 11 3 2
2. 0-500 1 2 10 11 1 4 6 701 67 452
3. 0-600 1 2 20 23 2 2 18 1013 626 635
4. 0-100 1 1 100 100 0 0 100 179 109 114
5. 0-50000 1 2 100 101 8 21 79 98869 54339 35059
6. 0-1000 1 2 500 499 0 1 499 1823 747 581
7. 0-50000 1 2 500 499 37 108 392 69313 46631 9950
8. 1 2 10000 10000 0 0 10000 14153 9347 517
9. 1 2 10000 10000 0 0 10000 57203 25869 37769
10. 0-50000 1 2 10000 10000 911 2397 6692 79063 33068 31178
* Indices from the crossover and mutation type as mentioned in the previous section
4 45 10 6 10 4 42 10 5 10
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Case 1 Multiple point mutations (2 points) gave a much
better result in lesser number of generations as compared to single point or more than 2 point mutation, Single Point Random crossover was found to produce much better results.
The convergence of the algorithm under any case was within 7-8 generations in the worst case.
For some cases, where the range of distribution was really big and not coincident with [0, N-1], the number of collisions was relatively more. However, this number was drastically reduced when N was taken as a prime number in the nearby range.
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Case 2 (Comparative Runs) In the next type of simulation, the algorithm
was tested against randomly selecting h. The algorithm performed much better than
the random selection, giving lesser number of collisions. Table 2. Results of Comparative Run 1
Input File ncollisions by random selection ncollisions by GA generated function
1 286 251
2 273 256
3 267 245
4 285 244
5 285 255
6 285 262
7 281 259
8 273 255
9 273 258
10 304 259Setting for GA: P=100, N=1423, pc=0.75 (1), pm=0.01 (1)
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
In the End…
Conclusion Future Work Acknowledgement
Conclusion The proposed algorithm produces an efficient
Universal Hash function for hashing a given distribution of keys which results in the relatively less number of collisions.
The problem of clustering is avoided by generating a hash function using metaheuristic, in this case Genetic Algorithms.
It performs better than random selection of h. This algorithm is ideal for scenarios where the
input distribution to be hashed is changing frequently and the hash function needs to be changed dynamically to rehash the input.
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Future Work The scope for future work on this algorithm
include selection of an efficient Sieve algorithm an efficient encoding of the chromosome understanding the effect of various types of
crossover and mutation on the result better design of fitness function so that the few
exceptional cases are also taken care of Testing the algorithm against some standard hash
functions.
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Acknowledgment My sincere thanks to Mr. Ramprasad Joshi, my
mentor and guide for this project. I also thank my colleague Miss Joanna Mary
Oommen for assistance with the paper and presentation.
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Thank You!
Any Questions?
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
GOA CAMPUS
2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
Top Related