Strictness-Unboxed explained

40
Strictness/Unboxed Explained Paul Meng @MnO2XMnO2

Transcript of Strictness-Unboxed explained

Strictness/Unboxed Explained

Paul Meng@MnO2XMnO2

Lazy Evaluation in Haskell

In the late ’70s and early ’80s, .... A series of seminal publications ignited an explosion of interest in the idea of lazy (or non-strict, or call-by-need) functional languages as a vehicle for writing serious programs.

A History of Haskell: Being Lazy With Class

● Lazy evaluation was once a hot research topic in academic world, and that founded the design of Haskell.

● There are Data.ByteString and Data.ByteString.Lazy, why?

But what does this exactly mean?

Let’s start from a metaphor

● You are helping your starving colleagues at the office to buy their lunch. You go to a McDonald’s, head to the counter, and order a bunch of things and make it to-go.

● Then the clerk gives you this big paper bag. You didn’t bother to check and just take it and run.

● Then you are doing non-strict evaluation!

You are in the office now

● Evaluation begins.● Wait! what’s your definition of

evaluation? In this metaphor, one step of evaluation is to open a bag or open a box

● Let’s see what would happen!

The First Step

Take the paper bag from the outermost paper bag. Seems no problem.

The Second Step

Take the meal boxes out of the paper bag. Seems no problem either.

The Third Step

No problem at all.

The Fourth Step

WTF??!

Non-strict semanticYou read on the HaskellWiki that strict semantic is

And non-strict semantic is

The symbol of upside-down T is called “bottom”. It is something undefined, or non-terminating program. In this case, just giving a finger. (See it looks like a finger, right?)

Non-strict semanticYou read it on HaskellWiki that strict semantic is

And non-strict semantic is

Now you can tell the difference between non-strict and strict.

Evaluate the bags at the counter, and catch the error. Finger sent!

No evaluation at the counter. Happy face.

Back to Haskell● In the example, the evaluation means either open bags or open boxes.

What is the “evaluation” in Haskell?● What’s the difference between non-strict and lazy?

Evaluation

1+1+1+1+1= 1+1+1+2= 1+1+3= 1+4= 5

Each step is an evaluation step. Or another fancy name, called “reduction”

Evaluation

Weak Head Normal FormWhat does this alien word mean?

To better explain it, let’s rewrite the last example a bit.

In Haskell, ‘+’ is a function, so 1+1+1+1+1 is actually

+(1, +(1, +(1, +(1, 1))

Normal FormThe form that can’t be further evaluated (or “reduced”)

5

Head Normal FormThe form that can’t be further evaluated if we only do evaluation at the “HEAD” position

+(1, +(1, +(1, +(1, 1))

Head

Outermost Bag

Here the head normal form = normal form

Trouble

(\x -> 1) ((fix (+1))

If we don’t evaluate at the head position first, then we are in trouble.

If we don’t open the outermost bag, maybe there would be infinite hamburgers inside!

Trouble

fix (+1)

But even if we evaluate at the head position, we are still not guaranteed to be fully evaluated in Haskell.

Head normal form doesn’t apply to Haskell in general (not for arbitrary terms)

Weak Head Normal FormWeak = “We are not guaranteed”Weak Head Normal Form = “We only evaluate at the head position, and only evaluate one step. To evaluate further, we are not guaranteed what would happen.”

Schrödinger’s Filet of Fish: Filet of Fish boxes could contain a Big Mac! We are not guaranteed unil we open it.

ThunkThunk is the expression that could still be reduced. (There are still bags!)

1+1+1+1+1

We are used to think that the above would be computed to value 5, but not for Haskell. It is what it is: (1+1+1+1+1)

Non-Strict vs Lazy● Non-strict is semantic, by definition it is something not equal to strict.● Strategy could be many, and lazy is just one of them.

Call-by-Need: Not evaluated until it is needed. It is the so called “lazy-evaluation”

Call-by-Name: a thunk is copied to every place inside the function body.

f x = x + x

f (1+1+1) => (1+1+1) + (1+1+1)

f (1+1+1)

call-by-name call-by-need

Non-Strict vs LazyCall-by-Need: Not evaluated until it is needed. It is the so called “lazy-evaluation”

Call-by-Name: a thunk is copied to every place inside the function body.

f x = x + x

f (1+1+1) => (1+1+1) + (1+1+1)=> 3 + 3

call-by-name call-by-need

call-by-value

f (1+1+1)=> f (3)=> 3+3

f (1+1+1) => let x = (1+1+1)=> x = 3 => therefore 3+3

Back to Haskell: sum

sum [] = 0sum (x:xs) = x + sum xs

Not tail recursion! It would create a stack frame for each recursive call.

sum’

sum’ acc [] = accsum’ acc (x:xs) = sum’ (acc+x) xs

This would not be reduced by default

It is tail recursion now, but still has a problem

sum’sum’ 0 [1,2,3,4]= sum’ (0+1) [2,3,4]= sum’ ((0+1)+2) [3,4]= sum’ (((0+1)+2)+3) [4]= sum’ ((((0+1)+2)+3)+4) []= ((((0+1)+2)+3)+4) = (((1+2)+3)+4)= ((3+3)+4)= (6+4)= 10

When the list is large enough, this would still cause stack overflow.

seq

seq :: a -> b -> b

This allows us to control the evaluation order, it would evaluate a first, then return b

let x = 1+2 in seq x (f x)

reduce the thunk before apply f

sum’

sum’ acc [] = accsum’ acc (x:xs) = let z = (acc+x) in seq z (sum’ z xs)

seq :: a -> b -> bit would evaluate a first, then return b

sum’sum’ 0 [1,2,3,4]= sum’ (1) [2,3,4]= sum’ (3) [3,4]= sum’ (6) [4]= sum’ (10) []= 10

No more stack overflow

Bang Patterns

sum’ !acc [] = accsum’ !acc (x:xs) = sum’ (acc+x) xs

{-# LANGUAGE BangPatterns -#}

For convenience, you don’t have to write so many ‘seq’s

deepseq

import Control.DeepSeq

deepseq :: NFData a => a -> b -> bdeepseq a b = rnf a `seq` b

-- A class of types that can be fully evaluated.class NFData a where rnf :: a -> () rnf a = a `seq` ()

NFData = Normal Form Data

rnf = reduce to normal form

deepseqinstance NFData a => NFData [a] where rnf [] = () rnf (x:xs) = rnf x `seq` rnf xs

instance (NFData a, NFData b, NFData c) => NFData (a,b,c) where rnf (x,y,z) = rnf x `seq` rnf y `seq` rnf z

Boxed vs Unboxed

The finite-precision integer type Int covers at least the range [ -2^29, 2^29 - 1]. As Int is an instance of the Bounded class, maxBound and minBound can be used to determine the exact Int range defined by an implementation

From Haskell98 Standard

One might imagine numbers naively represented in Haskell "as pointer to a heap-allocated object" which is either an unevaluated closure or is a "box" containing the number's actual value, which has now overwritten the closure

From HaskellWiki

No Definition in the Standard

Boxed vs UnboxedIt is GHC implementation detail. It is not defined in the Standard. It could be different in other implementation

Memory Layout of an Int

I# Int#One box is one machine word

Int is two words in GHC, one pointer of word-size pointing to a word-size heap object

Boxed vs UnboxedIn GHC, types ending in hashes are unboxed types: Int#, Float#, Double#,

Memory Layout of an Int#

Int#Only one machine word

(Int, Int)

Memory Layout of an (Int, Int)

I# Int#

I# Int#IP

7 machine words in total.

Unboxed Typeimport GHC.Prim

data IntPair = IP Int# Int#

Memory Layout of an IntPair

IP3 machine words in total.

Int# Int#

UNPACKdata IntPair = IP {-# UNPACK #-} !Int

{-# UNPACK #-} !Int

Memory Layout of an IntPair

IP3 machine words in total.

Int# Int#

Real World Examples#ifdef __GLASGOW_HASKELL__data UArray i e = UArray !i !i !Int ByteArray##endif

-- | Boxed vectors, supporting efficient slicing.data Vector a = Vector {-# UNPACK #-} !Int {-# UNPACK #-} !Int {-# UNPACK #-} !(Array a) deriving ( Typeable )

Epilogue

To write high performance Haskell (or specifically in GHC), you have to understand Strict and Unboxed Types thoroughly.

Thank youshould ask McDonald sponsorship? lol