Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful...

33
Hash Maps and Performance http://www.flickr.com/photos/ex oticcarlife/3270764550/ http://www.flickr.com/photos/ roel1943/5436844655/

Transcript of Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful...

Page 1: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Hash Maps and Performance

http://www.flickr.com/photos/exoticcarlife/3270764550/

http://www.flickr.com/photos/roel1943/5436844655/

Page 2: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Overview

2

Most useful ADTs: Dynamically sized lists

Handy when you don’t know how big to make an array e.g. Java’s ArrayList

Maps Handy when you want to (quickly) map a key to a value

Java HashMap A generic with two data types, <key, value>

Iterating over items in a HashMap

Performance analysis Why we care What we measure and how How functions grow

Empirical analysis The doubling hypothesis Order of growth

Page 3: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Mapping keys to values

Common problem: map one thing to another Often from a very large table of key/value pairs Often we want to do this really fast

3

Application Purpose Key Value

phone book look up phone number name phone number

dictionary look up word word definition

zip code map city to a zip code city zip code

login screen check user knows password username password

file system find file on disk filename location on disk

web search find all relevant pages search keywords list of pages

DNS find IP address given URL URL IP address

reverse DNS find URL given an IP address

IP address URL

Page 4: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Collections API: useful data types for storing stuff

List: Knows about the order of things

Set: Knows if something is in the set or not

Map: Knows how to get from a key to a value

4

Page 5: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

HashMap Example 1 Goal: Map a domain name to an IP address

5

import java.util.*;

public class HashMapDNS{

public static void main(String [] args){

HashMap<String, String> map = new HashMap<String, String>();

map.put("www.cnn.com" , "157.166.255.19" );map.put("cs.mtech.edu" , "150.131.202.136");map.put("www.google.com" , "74.125.224.240" );map.put("google.com" , "74.125.224.240" );

map.put("www.cnn.com" , "127.0.0.1" );

while (!StdIn.isEmpty()){

String str = StdIn.readLine();System.out.println(str + " -> " + map.get(str));

}}

}

HashMap is a generic requiring two types. Here both types are the same, but in general they could be different.

put() adds a key/value mapping

get() returns the value for a key, or null if key not in the map.

Repeating put() with same key replaces existing value of 157.166.255.19 with 127.0.0.1

Page 6: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

HashMap Why do we love HashMaps?

What does this mean?

Even if lots and lots of keys:

We can find value very quickly (constant time)!

Contrast with storing in an array of objects:

Each object stores name and value

Have to go through ½ on average to find target

Linear in the number of items in array

6

From the javadocs:This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

Page 7: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Order of growth

7

Doubling hypothesis ratio

Hypothesis Order of growth

1 T = a constant

1 T = a log N logarithmic

2 T = a N linear

2 T = a N log N

linearithmic

4 T = a N2 quadratic

8 T = a N3 cubic

2N T = a 2N exponential

Page 8: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

HashMap Example 2

Goal: Type animal, show image & play sound

8

public class Animal {

private String image = "";private String audio = "";

public Animal(String image, String audio){

this.image = image;this.audio = audio;

}

public void show(){

StdDraw.picture(0.5, 0.5, image);StdAudio.play(audio);

}}

Page 9: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Polymorphic Collections A collection object can store:

Any object of its declared type or any subclass of that type

Goal: normal animals, plus spinning version

9

public class AnimalSpin extends Animal {

private static final int STEPS = 100;private static final int DELAY_STEP = 10;

public AnimalSpin(String image, String audio){

super(image, audio);}

public void show(){

StdAudio.play(getAudio());double angle = 0.0;for (int i = 0; i <= STEPS; i++){

StdDraw.clear();StdDraw.picture(0.5, 0.5, getImage(), angle);angle += 360.0 / STEPS;StdDraw.show(DELAY_STEP);

}}

}

Page 10: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Spinning Animals

Main program remains the same: Except for 4 new entries in map

Must use a different String key since we want to map to a different object capable of spinning

10

HashMap<String, Animal> map = new HashMap<String, Animal>();

map.put("dog", new Animal("dog.jpg" , "dog.wav"));map.put("cat", new Animal("cat.jpg" , "cat.wav"));map.put("cow", new Animal("cow.jpg" , "cow.wav"));map.put("frog", new Animal("frog.jpg", "frog.wav"));

map.put("dog2", new AnimalSpin("dog.jpg" , "dog.wav"));map.put("cat2", new AnimalSpin("cat.jpg" , "cat.wav"));map.put("cow2", new AnimalSpin("cow.jpg" , "cow.wav"));map.put("frog2", new AnimalSpin("frog.jpg", "dog.wav"));

Page 11: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Iteration

HashMap stores (key, value) pairs

Common operations: Loop over all the keys

Loop over all the values

Loop over all the (key, value) pairs

11

"dog" "cat" "frog" "dog2" "cat2" "frog2"

Animal("dog.jpg","dog.wav")

Animal("cat.jpg","cat.wav")

Animal("frog.jpg","frog.wav")

AnimalSpin("dog.jpg","dog.wav")

AnimalSpin("cat.jpg","cat.wav")

AnimalSpin("frog.jpg","frog.wav")

We want to do these operations

using only the HashMap itself!

(without knowing the actual keys)

Page 12: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

12

public class HashMapIter{

public static void main(String [] args){

HashMap<String, Integer> map = new HashMap<String, Integer>();

map.put("Snickers", 3);map.put("Mars", 1);map.put("Butterfinger", 3);map.put("Reese's Pieces", 10);

System.out.println("All keys:");for (String key : map.keySet())

System.out.println(key);

System.out.println("\nAll values:");for (int value: map.values())

System.out.println(value);

System.out.println("\nAll (key, value) pairs:");for (Map.Entry<String, Integer> entry : map.entrySet())

System.out.println(entry.getKey() + " -> " + entry.getValue());}

}

HashMap iteration example program

All keys:Reese's PiecesSnickersMarsButterfinger

All values:10313

All (key, value) pairs:Reese's Pieces -> 10Snickers -> 3Mars -> 1Butterfinger -> 3

From the javadoc:This class makes no guarantees as to the order of the map

Page 13: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

The Challenge of Programming(One of Many?)

Q: Will my program be able to solve a large practical problem?

Key insight. [Knuth 1970s]Use the scientific method to understand performance.

compile debug solve problemsin practice

13

Page 14: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Scientific Method

Scientific method Observe some feature of the natural world Hypothesize a model that is consistent with the observations Predict events using the hypothesis Verify the predictions by making further observations Validate by repeating until hypothesis and observations agree

Principles Experiments must be reproducible Hypotheses must be falsifiable

14

Hypothesis: All swans are white

Page 15: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Why Performance Analysis

Predicting performance When will my program finish?

Will my program finish?

Compare algorithms Should I change to a more complicated algorithm?

Will it be worth the trouble?

Basis for inventing new ways to solve problems Enables new technology

Enables new research

15

Page 16: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Algorithmic Successes

Sorting

Rearrange array of N item in ascending order

Applications: databases, scheduling, statistics, genomics, …

Brute force: N 2 steps

Mergesort: N log N steps, enables new technology

John von Neumann(1945)

16

Page 17: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Algorithmic Successes

Discrete Fourier transform

Break down waveform of N samples into periodic components

Applications: DVD, JPEG, MRI, astrophysics, ….

Brute force: N 2 steps

FFT algorithm: N log N steps, enables new technology

Freidrich Gauss(1805)

17

Page 18: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Performance Metrics

What do we care about?

Time, how long do I have to wait?

Measure with a stop watch (real or virtual)

Run in a performance profiler

Often part of an IDE (e.g. Microsoft Visual Studio)

Sometimes standalone (e.g. gprof)

Helps you determine bottleneck in your code

18

long start = System.currentTimeMillis();

// Do the stuff you want to time

long now = System.currentTimeMillis();double elapsedSecs = (now - start) / 1000.0;

Measuring how long some code takes.

Page 19: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Performance Metrics

What do we care about?

Space, do I have the resources to solve it?

Usually we care about physical memory

8 GB = 8.6 billion places to store a byte (byte = 256

possibilities)

Java double, 64-bits = 8 bytes

8 GB / 8 bytes = over 1 million doubles!

Can swap to disk for some extra space

But much much slower

19

Stats.java class provides measurement of time and

memory usage.

Page 20: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

A “Simple" Problem

Three-sum problem

Given N integers, find all triples that sum to 0

20

% more 8ints.txt830 -30 -20 -10 40 0 10 5

% java ThreeSum < 8ints.txt30 -30 030 -20 -10-30 -10 40-10 0 10

Brute force algorithm:

Try all possible triples and see if

they sum to 0.

Page 21: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Three Sums: Brute-Force

21

public class ThreeSum{

public static void main(String [] args){

int N = StdIn.readInt();int [] nums = new int[N];for (int i = 0; i < N; i++)

nums[i] = StdIn.readInt();

for (int i = 0; i < N; i++)for (int j = i + 1; j < N; j++)

for (int k = j + 1; k < N; k++)if (nums[i] + nums[j] + nums[k] == 0)

System.out.println(nums[i] + " " + nums[j] + " " + nums[k]);

}}

All possible triples i < j < k from the set of integers.

Page 22: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Empirical Analysis: Three Sum

Run program for various input sizes, 2 machines: Keith’s first laptop: Pentium 1, 150Mhz, 80MB RAM

Keith’s desktop: Phenom II, 3.2Ghz (3.6Ghz turbo), 32GB RAM

22

VS.

Page 23: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Empirical Analysis: Three Sum

Run program for various input sizes, 2 machines: Keith’s first laptop: Pentium 1, 150Mhz, 80MB RAM Keith’s desktop: Phenom II, 3.2Ghz (3.6Ghz turbo), 32GB RAM My desktop: Intel i7, 3.6 GHz, 16GB RAM

23

N ancientlaptop

modern desktop

My desktop

100 0.33 0.01 0.004

200 2.04 0.04 0.008

400 11.23 0.16 0.021

800 94.96 0.63 0.061

1600 734.03 4.33 0.38

3200 5815.30 33.69 2.917

6400 47311.43 263.82 23.23

Page 24: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Doubling Hypothesis

Cheap and cheerful analysis

Time program for input size N

Time program for input size 2N

Time program for input size 4N

Ratio T(2N) / T(N) approaches a constant

Constant tells you the exponent in T = aNb

24

Constant from ratio

Hypothesis Order of growth

2 T = a N linear, O(N)

4 T = a N2 quadratic, O(N2)

8 T = a N3 cubic, O(N3)

16 T = a N4 O(N4)

N T(N) ratio

400 0.16 -

800 0.63 3.94

1600 4.33 6.87

3200 33.69 7.78

6400 263.82 7.83

Keith’s Desktop data

Page 25: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Estimating Constant, Making Predictions

25

N T(N) ratio

400 0.16 -

800 0.63 3.94

1600 4.33 6.87

3200 33.69 7.78

6400 263.82 7.83

Keith’s Desktop data

N T(N) ratio

400 11.23 -

800 94.96 8.45

1600 734.03 7.72

3200 5815.30 7.92

6400 47311.43 8.14

Keith’s Laptop data

T = a N3

263.82 = a (6400)3

a = 1.01 x 10-09

T = a N3

47311.43 = a (6400)3

a=1.80 x 10-07

Prediction:How long for desktop to solve a 100,000 integer problem?

1.01 x 10-09 (100000)3 = 1006393 secs= 280 hours

Prediction:How long for laptop to solve a 100,000 integer problem?

1.80 x 10-07 (100000)3 = 1.80 x 1008 secs= 50133 hours

Page 26: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Bottom Line

My three sum algorithm sucks

Does not scale to large problems → an algorithm problem

15 years of computer progress didn't help much

My algorithm: O(N3)

A slightly more complicated algorithm: O(N2 log N)

26

Using the better algorithm, how long does it take the modern desktop to solve a 100,000 integer problem?

1.01 x 10-09 (100000)2 log(100000) = 168 secs

This assumes the same constant. Really should do the doubling experiment again with the new algorithm.Using the better algorithm, how long does it

take the ancient laptop to solve a 100,000 integer problem?

1.80 x 10-07 (100000)2 log(100000) = 29897 secs

Page 27: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Order of Growth

27

Doubling hypothesis ratio

Hypothesis Order of growth

1 T = a constant, O(1)

1 T = a log N logarithmic, O(log N)

2 T = a N linear, O(N)

2 T = a N log N linearithmic, O(N log N)

4 T = a N2 quadratic, O(N2)

8 T = a N3 cubic, O(N3)

2N T = a 2N exponential, O(2N)

Page 28: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Order of Growth: Consequences

28

Page 29: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

A small number of functions describe the running time of many fundamental algorithms!

29

while (N > 1){

N = N / 2;...

}

log N

for (int i = 0; i < N; i++)...

N

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

...

N2

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < N; k++)...

N3

public static void g(int N){

if (N == 0) return;g(N / 2);g(N / 2);for (int i = 0; i < N; i++)

...}

N log Npublic static void f(int N){

if (N == 0) return;f(N - 1);f(N - 1);...

}2N

Order of Growth

while (N > 1){

N = N / 2;...

}

for (int i = 0; i < N; i++)...

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

...

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < N; k++)...

public static void g(int N){

if (N == 0) return;g(N / 2);g(N / 2);for (int i = 0; i < N; i++)

...}

Page 30: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

30

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < N; k++)count++;

N3

Nested loops A good clue to order of growth But each loop must execute "on the order of" N times If loop not a linear function of N, loop doesn't cause order to

grow

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < 10000; k++)count++;

N2

N T(N) ratio

5000 6.85 -

10000 53.48 7.8

20000 425.97 8.0

N T(N) ratio

5000 13.40 -

10000 53.20 3.97

20000 212.49 3.99

425.97 = a (200003)a = 1.06 x 10-6

212.49 = a (200002)a = 5.31 x 10-7

Growth of Nested Loops

Page 31: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < N; k++)count++;

N3

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < 10000; k++)count++;

N2

N T(N) ratio

5000 6.85 -

10000 53.48 7.8

20000 425.97 8.0

N T(N) ratio

5000 13.40 -

10000 53.20 3.97

20000 212.49 3.99

425.97 = a (200003)a = 1.06 x 10-6

212.49 = a (200002)a = 5.31 x 10-7

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < N / 5; k++)count++;

N3

N T(N) ratio

5000 1.59 -

10000 11.08 6.97

20000 86.36 7.79

86.36 = a (200003)a = 2.16 x 10-7

for (int i = 0; i < N; i++)for (int j = 0; j < N; j++)

for (int k = 0; k < 10; k++)count++;

N2

N T(N) ratio

5000 0.11 -

10000 0.37 3.36

20000 1.47 3.97

1.47 = a (200002)a = 3.68 x 10-9

31

Page 32: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Recap on Performance

Introduction to Analysis of Algorithms Today: simple empirical estimation

Next year: an entire semester course

The algorithm matters Faster computer only buys you out of trouble

temporarily

Better algorithms enable new technology!

The data structure matters

Doubling hypothesis Measure time ratio as you double the input size

If the ratio = 2b, runtime of algorithm T(N) = a N b

32

Page 33: Hash Maps and Performance - Montana Technological … HashMaps-Performance.pdfOverview 2 Most useful ADTs: Dynamically sized lists Handy when you don’t know how big to make an array

Summary

Most useful ADTs: Dynamically sized lists

Handy when you don’t know how big to make an array e.g. Java’s ArrayList

Maps Handy when you want to (quickly) map a key to a value

Java HashMap A generic with two data types, <key, value>

Iterating over items in a HashMap

Performance analysis Why we care What we measure and how How functions grow

Empirical analysis The doubling hypothesis Order of growth