Download - CS Fundamentals: Scalability and Memory

SCALABILITY AND MEMORYCS FUNDAMENTALS SERIES

http://bit.ly/1TPJCe6

HOW DO YOU MEASURE AN ALGORITHM?

CLOCK TIME?

DEPENDS ON WHO’S COUNTING.

ALSO, TOO FLAKY EVEN ON THE SAME

MACHINE.

THE NUMBER OF LINES?

THE NUMBER OF

LINES?

THIS IS TWO LINES, BUT A WHOLE LOT OF STUPID.

THE NUMBER OF CPU CYCLES?

THE NUMBER OF

CPU CYCLES?

DEPENDS ON THE RUNTIME.

ALL THESE METHODS SUCK.

NONE OF THEM CAPTURE WHAT WE ACTUALLY CARE ABOUT.

ENTER BIG O!

TEXT

ASYMPTOTIC ANALYSIS

TEXT

ASYMPTOTIC ANALYSIS

▸ Big O is about asymptotic analysis

TEXT

ASYMPTOTIC ANALYSIS


▸ In other words, it’s about how an algorithm scales when the numbers get huge

TEXT

ASYMPTOTIC ANALYSIS



▸ You can also describe this as “the rate of growth”

TEXT

ASYMPTOTIC ANALYSIS



▸ You can also describe this as “the rate of growth”

▸ How fast do the numbers become unmanageable?

TEXT

ASYMPTOTIC ANALYSIS

TEXT

ASYMPTOTIC ANALYSIS

▸ Another way to think about this is:

TEXT

ASYMPTOTIC ANALYSIS


▸ What happens when your input size is 10,000,000? Will your program be able to resolve?

TEXT

ASYMPTOTIC ANALYSIS


▸ What happens when your input size is 10,000,000? Will your program be able to resolve?

▸ It’s about scalability, not necessarily speed

TEXT

PRINCIPLES OF BIG O

TEXT

PRINCIPLES OF BIG O

▸ Big O is a kind of mathematical notation

TEXT

PRINCIPLES OF BIG O


▸ In computer science, it means essentially means

TEXT

PRINCIPLES OF BIG O



“the asymptotic rate of growth”

TEXT

PRINCIPLES OF BIG O



“the asymptotic rate of growth”▸ In other words, how does the running time of this function

scale with the input size when the numbers get big?

TEXT

PRINCIPLES OF BIG O





▸ Big O notation looks like this:

TEXT

PRINCIPLES OF BIG O





▸ Big O notation looks like this:

O(n) O(nlog(n)) O(n2)

TEXT

PRINCIPLES OF BIG O

TEXT

PRINCIPLES OF BIG O

▸ n here refers to the input size

TEXT

PRINCIPLES OF BIG O


▸ Can be the size of an array, the length of a string, the number of bits in a number, etc.

TEXT

PRINCIPLES OF BIG O



▸ O(n) means the algorithm scales linearly with the input

TEXT

PRINCIPLES OF BIG O



▸ O(n) means the algorithm scales linearly with the input

▸ Think like a line (y = x)

TEXT

PRINCIPLES OF BIG O

TEXT

PRINCIPLES OF BIG O

▸ “Scaling linearly” can mean 1:1 (one iteration per extra input), but it doesn’t necessarily

TEXT

PRINCIPLES OF BIG O

▸ “Scaling linearly” can mean 1:1 (one iteration per extra input), but it doesn’t necessarily

▸ It can simply mean k:1 where k is constant, like 3:1 or 5:1 (i.e., a constant amount of time per extra input)

TEXT

PRINCIPLES OF BIG O

TEXT

PRINCIPLES OF BIG O▸ In Big O, we strip out any coefficients or smaller factors.

TEXT


▸ The fastest-growing factor wins. This is also known as the dominant factor.

TEXT



▸ Just think, when the numbers get huge, what dwarfs everything else?

TEXT




▸ O(5n) => O(n)

TEXT




▸ O(5n) => O(n)

▸ O(½n - 10) also => O(n)

TEXT

PRINCIPLES OF BIG O

TEXT

PRINCIPLES OF BIG O

▸ O(k) where k is any constant reduces to O(1).

TEXT

PRINCIPLES OF BIG O


▸ O(200) = O(1)

TEXT

PRINCIPLES OF BIG O


▸ O(200) = O(1)

▸ Where there are multiple factors of growth, the most dominant one wins.

TEXT

PRINCIPLES OF BIG O


▸ O(200) = O(1)

▸ Where there are multiple factors of growth, the most dominant one wins.

▸ O(n4 + n2 + 40n) = O(n4)

TEXT

PRINCIPLES OF BIG O

TEXT

PRINCIPLES OF BIG O

▸ If there are two inputs (say you’re trying to find all the common substrings of two strings), then you use two variables in your Big O notation => O(n * m)

TEXT

PRINCIPLES OF BIG O


▸ Doesn’t matter if one variable probably dwarfs the other. You always include both.

TEXT

PRINCIPLES OF BIG O



▸ O(n + m) => this is considered linear

TEXT

PRINCIPLES OF BIG O



▸ O(n + m) => this is considered linear

▸ O(2n + log(m)) => this is considered exponential

TEXT

COMPREHENSION TEST

TEXT

COMPREHENSION TEST

Convert each of these to their appropriate Big O form!

TEXT

COMPREHENSION TEST


▸ O(3n + 5)

TEXT

COMPREHENSION TEST


▸ O(3n + 5)

▸ O(n + 1/5n2)

TEXT

COMPREHENSION TEST


▸ O(3n + 5)

▸ O(n + 1/5n2)

▸ O(log(n) + 5000)

TEXT

COMPREHENSION TEST


▸ O(3n + 5)

▸ O(n + 1/5n2)

▸ O(log(n) + 5000)

▸ O(2m3 + 50 + ½n)

TEXT

COMPREHENSION TEST


▸ O(3n + 5)

▸ O(n + 1/5n2)

▸ O(log(n) + 5000)

▸ O(2m3 + 50 + ½n)

▸ O(nlog(m) + 2m2 + nm)

▸ What should n be for this function?

For each character in the string…

Unshift them into an array…And then join the array together.

Let’s break it down.

Make an empty array.



▸ Initialize an empty array => O(1)

▸ Then, split the string into an array of characters => O(n)

▸ Then for each character => O(n)

▸ Unshift into an array => O(n)

▸ Then join the characters into a string => O(n)

We’ll see later why this is.









These multiply. => O(n2)




▸ O(n2 + 2n) = O(n2)




▸ O(n2 + 2n) = O(n2)

▸ This algorithm is quadratic.




▸ O(n2 + 2n) = O(n2)

▸ This algorithm is quadratic.

▸ Let’s see how badly it sucks.


Benchmark away!

(showSlowReverse.js)

TEXT

TIME COMPLEXITIES WAY TOO FAST

TEXT


Constant O(1) math, pop, push, arr[i], property access, conditionals, initializing a variable

TEXT



Logarithmic O(logn) binary search

TEXT




Linear O(n) linear search, iteration

TEXT





Linearithmic O(nlogn) sorting (merge sort, quick sort)

TEXT






Quadratic O(n2) nested looping, bubble sort

TEXT







Cubic O(n3) triply nested looping, matrix multiplication

TEXT








Polynomial O(nk) all “efficient” algorithms

TEXT









Exponential O(2n) subsets, solving chess

TEXT









Exponential O(2n) subsets, solving chess

Factorial O(n!) permutations

TIME TO IDENTIFY TIME COMPLEXITIES

OPTIMIZATIONS DON’T ALWAYS MATTER

BOTTLENECKS

BOTTLENECKS

▸ A bottleneck is the part of your code where your algorithm spends most of its time.

BOTTLENECKS


▸ Asymptotically, it’s wherever the dominant factor is.

BOTTLENECKS



▸ If your algorithm is has an O(n) part and an O(50) part, the bottleneck is the O(n) part.

BOTTLENECKS



▸ If your algorithm is has an O(n) part and an O(50) part, the bottleneck is the O(n) part.

▸ As n => ∞, your algorithm will eventually spend 99%+ of its time in the bottleneck.

BOTTLENECKS

BOTTLENECKS

▸ When trying to optimize or speed up an algorithm, focus on the bottleneck.

BOTTLENECKS


▸ Optimizing code outside the bottleneck will have a minuscule effect.

BOTTLENECKS


▸ Optimizing code outside the bottleneck will have a minuscule effect.

▸ Bottleneck optimizations on the other hand can easily be huge!

BOTTLENECKS

BOTTLENECKS

▸ If you cut down non-bottleneck code, you might be able to save .01% of your runtime.

BOTTLENECKS


▸ If you cut down on bottleneck code, you might be able to save 30% of your runtime.

BOTTLENECKS


▸ If you cut down on bottleneck code, you might be able to save 30% of your runtime.

▸ Better yet, try to lower the time complexity altogether if you can!

BOTTLENECK EXERCISE

SPACE COMPLEXITY

SPACE COMPLEXITY

▸ Same thing, except now with memory instead of time.

SPACE COMPLEXITY


▸ Do you take linear extra space relative to the input?

SPACE COMPLEXITY


▸ Do you take linear extra space relative to the input?

▸ Do you allocate new arrays? Do you have to make a copy of the original input? Are you creating nested data structures?

COMPREHENSION CHECK

COMPREHENSION CHECK

▸ What is the space complexity of:

COMPREHENSION CHECK


▸ max(arr)

COMPREHENSION CHECK


▸ max(arr)

▸ firstFive(arr)

COMPREHENSION CHECK


▸ max(arr)

▸ firstFive(arr)

▸ substrings(str)

COMPREHENSION CHECK


▸ max(arr)

▸ firstFive(arr)

▸ substrings(str)

▸ hasVowel(str)

SO WHAT THE HELL IS MEMORY ANYWAY

TO UNDERSTAND MEMORY, WE NEED TO UNDERSTAND HOW A COMPUTER IS STRUCTURED.

Immediate workspace. A CPU usually has 16 of these.

Data Layers

1 cycle


Data Layers

A nearby reservoir of useful data we’ve recently read. Close-by.

1 cycle

~4 cycles


Data Layers


More nearby data, but a little farther away.

1 cycle

~4 cycles

~10 cycles


Data Layers



~800 cycles. Getting pretty far now. It’s completely random-access, but takes a while.

1 cycle

~4 cycles

~10 cycles


Data Layers



~800 cycles. Getting pretty far now. It’s completely random-access, but takes a while.

1 cycle

~4 cycles

~10 cycles

On an SSD, you’re looking at ~5,000 cycles.This is pretty much another country.

And on a spindle drive, it’s more like 50,000.

SO ALL DATA TAKES A JOURNEY UP FROM THE HARD DISK TO

EVENTUALLY LIVE IN A REGISTER.

WHAT DOES MEMORY ACTUALLY LOOK LIKE?

IT’S JUST A BUNCH OF CELLS WITH SHIT IN ‘EM.

IT’S ALL BINARY DATA.

STRINGS, FLOATS, OBJECTS, THEY’RE ALL STORED AS BINARY.

AND IT’S ALL STORED CONTIGUOUSLY.

THIS IS VERY IMPORTANT WHEN IT COMES TO ARRAYS.

ARRAYS ARE JUST CONTIGUOUS BLOCKS OF

MEMORY.

THAT’S WHY THEY’RE SO FAST.

Garbage Also garbage


Assume each of these cells are 8 bytes (64-bits)



Let’s imagine they’re addressed like so…



832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

this.startAddr = 833096;



832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

Each cell is offset by exactly 64 in the address space




832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

Each cell is offset by exactly 64 in the address space

Meaning you can easily derive the address of any index


832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

function get(i) {

return this.startAddr + i * 64;

}


832968 833032 833096 833160 833224 833288 833352 833416 833480 833544

function get(i) {

return this.startAddr + i * 64;

}


get(3) = 833096 + 3 * 64 = 83306 + 192 = 833288

THIS IS POINTER ARITHMETIC.

THIS IS WHAT MAKES ARRAY LOOKUPS O(1)

AND IT’S WHY ARRAYS ARE BY FAR THE FASTEST DATA

STRUCTURE

LET’S WRAP UP BY TALKING ABOUT CACHE

EFFICIENCY.

CACHES ARE DUMB.

When the CPU needs data, it first looks in the cache.


Say it’s not in the cache. This is called a cache miss.



The cache then loads the data the CPU requested from RAM…



The cache then loads the data the CPU requested from RAM…

But the cache guesses that if the CPU wanted this data, it probably will also want

other nearby data eventually. It would be stupid to have to make multiple round trips.

In other words, the cache assumes that related data will be stored around the same physical area.

In other words, the cache assumes that related data will be stored around the same physical area.

The cache assumes locality of data.

So the cache just loads a huge contiguous chunk of data around the address the CPU asked for.

OK. SO?

Remember this?

Loading from memory is slow as shit.

Remember this?

Loading from memory is slow as shit.

We really want to minimize cache misses.

SO KEEP YOUR DATA LOCAL AND YOUR DATA STRUCTURES

CONTIGUOUS.

ARRAYS ARE KING, BECAUSE ALL OF THE DATA IS LITERALLY RIGHT NEXT

TO EACH OTHER IN MEMORY!

An algorithm that jumps around in memoryor follows a bunch of pointers to other objectswill trigger lots of cache misses!

An algorithm that jumps around in memoryor follows a bunch of pointers to other objectswill trigger lots of cache misses!

Think linked lists, trees, even hash maps.

IDEALLY, YOU WANT TO WORK LOCALLY WITHIN ARRAYS OF

CONTIGUOUS DATA.

LET’S DO A QUICK EXERCISE.

QUESTIONS?

I AM HASEEB QURESHI

You can find me on Twitter: @hosseeb

You can read my blog at: haseebq.com

http://haseebq.com

PLEASE DONATE IF YOU GOT SOMETHING OUT OF THIS

<3

Ranked by GiveWell as the most

efficient charity in the world!