Pitfalls of object_oriented_programming_gcap_09

114
Tony Albrecht Technical Consultant Developer Services Sony Computer Entertainment Europe Research & Development Division Pitfalls of Object Oriented Programming

Transcript of Pitfalls of object_oriented_programming_gcap_09

Page 1: Pitfalls of object_oriented_programming_gcap_09

Tony Albrecht – Technical Consultant

Developer Services

Sony Computer Entertainment Europe

Research & Development Division

Pitfalls of Object Oriented Programming

Page 2: Pitfalls of object_oriented_programming_gcap_09

Slide 2

• A quick look at Object Oriented (OO)

programming

• A common example

• Optimisation of that example

• Summary

What I will be covering

Page 3: Pitfalls of object_oriented_programming_gcap_09

Slide 3

• What is OO programming?– a programming paradigm that uses "objects" – data structures

consisting of datafields and methods together with their interactions – to design applications and computer programs.(Wikipedia)

• Includes features such as– Data abstraction

– Encapsulation

– Polymorphism

– Inheritance

Object Oriented (OO) Programming

Page 4: Pitfalls of object_oriented_programming_gcap_09

Slide 4

• OO programming allows you to think about problems in terms of objects and their interactions.

• Each object is (ideally) self contained

– Contains its own code and data.

– Defines an interface to its code and data.

• Each object can be perceived as a „black box‟.

What’s OOP for?

Page 5: Pitfalls of object_oriented_programming_gcap_09

Slide 5

• If objects are self contained then they can be– Reused.

– Maintained without side effects.

– Used without understanding internal implementation/representation.

• This is good, yes?

Objects

Page 6: Pitfalls of object_oriented_programming_gcap_09

Slide 6

• Well, yes

• And no.

• First some history.

Are Objects Good?

Page 7: Pitfalls of object_oriented_programming_gcap_09

Slide 7

A Brief History of C++

C++ development started

1979 2009

Page 8: Pitfalls of object_oriented_programming_gcap_09

Slide 8

A Brief History of C++

Named “C++”

1979 20091983

Page 9: Pitfalls of object_oriented_programming_gcap_09

Slide 9

A Brief History of C++

First Commercial release

1979 20091985

Page 10: Pitfalls of object_oriented_programming_gcap_09

Slide 10

A Brief History of C++

Release of v2.0

1979 20091989

Page 11: Pitfalls of object_oriented_programming_gcap_09

Slide 11

A Brief History of C++

Release of v2.0

1979 20091989

Added • multiple inheritance,• abstract classes,• static member functions, • const member functions • protected members.

Page 12: Pitfalls of object_oriented_programming_gcap_09

Slide 12

A Brief History of C++

Standardised

1979 20091998

Page 13: Pitfalls of object_oriented_programming_gcap_09

Slide 13

A Brief History of C++

Updated

1979 20092003

Page 14: Pitfalls of object_oriented_programming_gcap_09

Slide 14

A Brief History of C++

C++0x

1979 2009 ?

Page 15: Pitfalls of object_oriented_programming_gcap_09

Slide 15

• Many more features have been added to C++

• CPUs have become much faster.

• Transition to multiple cores

• Memory has become faster.

So what has changed since 1979?

http://www.vintagecomputing.com

Page 16: Pitfalls of object_oriented_programming_gcap_09

Slide 16

CPU performance

Computer architecture: a quantitative approach

By John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau

Page 17: Pitfalls of object_oriented_programming_gcap_09

Slide 17

CPU/Memory performance

Computer architecture: a quantitative approach

By John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau

Page 18: Pitfalls of object_oriented_programming_gcap_09

Slide 18

• One of the biggest changes is that memory access speeds are far slower (relatively)

– 1980: RAM latency ~ 1 cycle

– 2009: RAM latency ~ 400+ cycles

• What can you do in 400 cycles?

What has changed since 1979?

Page 19: Pitfalls of object_oriented_programming_gcap_09

Slide 19

• OO classes encapsulate code and data.

• So, an instantiated object will generally

contain all data associated with it.

What has this to do with OO?

Page 20: Pitfalls of object_oriented_programming_gcap_09

Slide 20

• With modern HW (particularly consoles),

excessive encapsulation is BAD.

• Data flow should be fundamental to your

design (Data Oriented Design)

My Claim

Page 21: Pitfalls of object_oriented_programming_gcap_09

Slide 21

• Base Object class– Contains general data

• Node – Container class

• Modifier– Updates transforms

• Drawable/Cube– Renders objects

Consider a simple OO Scene Tree

Page 22: Pitfalls of object_oriented_programming_gcap_09

Slide 22

• Each object

– Maintains bounding sphere for culling

– Has transform (local and world)

– Dirty flag (optimisation)

– Pointer to Parent

Object

Page 23: Pitfalls of object_oriented_programming_gcap_09

Slide 23

Objects

Class Definition Memory LayoutEach square is

4 bytes

Page 24: Pitfalls of object_oriented_programming_gcap_09

Slide 24

• Each Node is an object, plus

– Has a container of other objects

– Has a visibility flag.

Nodes

Page 25: Pitfalls of object_oriented_programming_gcap_09

Slide 25

Nodes

Class Definition Memory Layout

Page 26: Pitfalls of object_oriented_programming_gcap_09

Slide 26

Consider the following code…

• Update the world transform and world

space bounding sphere for each object.

Page 27: Pitfalls of object_oriented_programming_gcap_09

Slide 27

Consider the following code…

• Leaf nodes (objects) return transformed

bounding spheres

Page 28: Pitfalls of object_oriented_programming_gcap_09

Slide 28

Consider the following code…

• Leaf nodes (objects) return transformed

bounding spheresWhat‟s wrong with this

code?

Page 29: Pitfalls of object_oriented_programming_gcap_09

Slide 29

Consider the following code…

• Leaf nodes (objects) return transformed

bounding spheresIf m_Dirty=false then we get branch

misprediction which costs 23 or 24

cycles.

Page 30: Pitfalls of object_oriented_programming_gcap_09

Slide 30

Consider the following code…

• Leaf nodes (objects) return transformed

bounding spheresCalculation of the world bounding sphere

takes only 12 cycles.

Page 31: Pitfalls of object_oriented_programming_gcap_09

Slide 31

Consider the following code…

• Leaf nodes (objects) return transformed

bounding spheresSo using a dirty flag here is actually

slower than not using one (in the case

where it is false)

Page 32: Pitfalls of object_oriented_programming_gcap_09

Slide 32

Lets illustrate cache usage

Main Memory L2 CacheEach cache line is

128 bytes

Page 33: Pitfalls of object_oriented_programming_gcap_09

Slide 33

Cache usage

Main Memory

parentTransform is already

in the cache (somewhere)

L2 Cache

Page 34: Pitfalls of object_oriented_programming_gcap_09

Slide 34

Cache usage

Main MemoryAssume this is a 128byte

boundary (start of cacheline) L2 Cache

Page 35: Pitfalls of object_oriented_programming_gcap_09

Slide 35

Cache usage

Main Memory

Load m_Transform into cache

L2 Cache

Page 36: Pitfalls of object_oriented_programming_gcap_09

Slide 36

Cache usage

Main Memory

m_WorldTransform is stored via

cache (write-back)

L2 Cache

Page 37: Pitfalls of object_oriented_programming_gcap_09

Slide 37

Cache usage

Main Memory

Next it loads m_Objects

L2 Cache

Page 38: Pitfalls of object_oriented_programming_gcap_09

Slide 38

Cache usage

Main Memory

Then a pointer is pulled from

somewhere else (Memory

managed by std::vector)

L2 Cache

Page 39: Pitfalls of object_oriented_programming_gcap_09

Slide 39

Cache usage

Main Memory

vtbl ptr loaded into Cache

L2 Cache

Page 40: Pitfalls of object_oriented_programming_gcap_09

Slide 40

Cache usage

Main Memory

Look up virtual function

L2 Cache

Page 41: Pitfalls of object_oriented_programming_gcap_09

Slide 41

Cache usage

Main Memory

Then branch to that code

(load in instructions)

L2 Cache

Page 42: Pitfalls of object_oriented_programming_gcap_09

Slide 42

Cache usage

Main Memory

New code checks dirty flag then

sets world bounding sphere

L2 Cache

Page 43: Pitfalls of object_oriented_programming_gcap_09

Slide 43

Cache usage

Main Memory

Node‟s World Bounding Sphere

is then Expanded

L2 Cache

Page 44: Pitfalls of object_oriented_programming_gcap_09

Slide 44

Cache usage

Main Memory

Then the next Object is

processed

L2 Cache

Page 45: Pitfalls of object_oriented_programming_gcap_09

Slide 45

Cache usage

Main Memory

First object costs at least 7

cache misses

L2 Cache

Page 46: Pitfalls of object_oriented_programming_gcap_09

Slide 46

Cache usage

Main Memory

Subsequent objects cost at least

2 cache misses each

L2 Cache

Page 47: Pitfalls of object_oriented_programming_gcap_09

Slide 47

• 11,111 nodes/objects in a tree 5 levels deep

• Every node being transformed

• Hierarchical culling of tree

• Render method is empty

The Test

Page 48: Pitfalls of object_oriented_programming_gcap_09

Slide 48

Performance

This is the time

taken just to

traverse the tree!

Page 49: Pitfalls of object_oriented_programming_gcap_09

Slide 49

Why is it so slow?~22ms

Page 50: Pitfalls of object_oriented_programming_gcap_09

Slide 50

Look at GetWorldBoundingSphere()

Page 51: Pitfalls of object_oriented_programming_gcap_09

Slide 51

Samples can be a little

misleading at the source

code level

Page 52: Pitfalls of object_oriented_programming_gcap_09

Slide 52

if(!m_Dirty) comparison

Page 53: Pitfalls of object_oriented_programming_gcap_09

Slide 53

Stalls due to the load 2

instructions earlier

Page 54: Pitfalls of object_oriented_programming_gcap_09

Slide 54

Similarly with the matrix

multiply

Page 55: Pitfalls of object_oriented_programming_gcap_09

Slide 55

Some rough calculations

Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms

Page 56: Pitfalls of object_oriented_programming_gcap_09

Slide 56

Some rough calculations

Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms

L2 Cache misses: 36,345 @ 400 cycles each ~= 4.54ms

Page 57: Pitfalls of object_oriented_programming_gcap_09

Slide 57

• From Tuner, ~ 3 L2 cache misses per object

– These cache misses are mostly sequential (more than 1 fetch from main memory can happen at once)

– Code/cache miss/code/cache miss/code…

Page 58: Pitfalls of object_oriented_programming_gcap_09

Slide 58

• How can we fix it?

• And still keep the same functionality and

interface?

Slow memory is the problem here

Page 59: Pitfalls of object_oriented_programming_gcap_09

Slide 59

• Use homogenous, sequential sets of data

The first step

Page 60: Pitfalls of object_oriented_programming_gcap_09

Slide 60

Homogeneous Sequential Data

Page 61: Pitfalls of object_oriented_programming_gcap_09

Slide 61

• Use custom allocators

– Minimal impact on existing code

• Allocate contiguous

– Nodes

– Matrices

– Bounding spheres

Generating Contiguous Data

Page 62: Pitfalls of object_oriented_programming_gcap_09

Slide 62

Performance

19.6ms -> 12.9ms

35% faster just by

moving things around in

memory!

Page 63: Pitfalls of object_oriented_programming_gcap_09

Slide 63

• Process data in order

• Use implicit structure for hierarchy

– Minimise to and fro from nodes.

• Group logic to optimally use what is already in cache.

• Remove regularly called virtuals.

What next?

Page 64: Pitfalls of object_oriented_programming_gcap_09

Slide 64

Hierarchy

Node

We start with

a parent Node

Page 65: Pitfalls of object_oriented_programming_gcap_09

Slide 65

Hierarchy

Node

Node NodeWhich has children

nodes

Page 66: Pitfalls of object_oriented_programming_gcap_09

Slide 66

Hierarchy

Node

Node Node

And they have a

parent

Page 67: Pitfalls of object_oriented_programming_gcap_09

Slide 67

Hierarchy

Node

Node Node

Node Node Node NodeNode Node Node Node

And they have

children

Page 68: Pitfalls of object_oriented_programming_gcap_09

Slide 68

Hierarchy

Node

Node Node

Node Node Node NodeNode Node Node Node

And they all have

parents

Page 69: Pitfalls of object_oriented_programming_gcap_09

Slide 69

Hierarchy

Node

Node Node

Node Node Node NodeNode Node Node Node

A lot of this

information can be

inferred

Node Node Node NodeNode Node Node Node

Page 70: Pitfalls of object_oriented_programming_gcap_09

Slide 70

Hierarchy

Node

Node Node

Node Node Node NodeNode Node Node Node

Use a set of arrays,

one per hierarchy

level

Page 71: Pitfalls of object_oriented_programming_gcap_09

Slide 71

Hierarchy

Node

Node Node

Node Node Node NodeNode Node Node Node

Parent has 2

children:2

:4:4

Children have 4

children

Page 72: Pitfalls of object_oriented_programming_gcap_09

Slide 72

Hierarchy

Node

Node Node

Node Node Node NodeNode Node Node Node

Ensure nodes and their

data are contiguous in

memory:2

:4:4

Page 73: Pitfalls of object_oriented_programming_gcap_09

Slide 73

• Make the processing global rather than local

– Pull the updates out of the objects.• No more virtuals

– Easier to understand too – all code in one place.

Page 74: Pitfalls of object_oriented_programming_gcap_09

Slide 74

• OO version

– Update transform top down and expand WBS

bottom up

Need to change some things…

Page 75: Pitfalls of object_oriented_programming_gcap_09

Slide 75

Node

Node Node

Node Node Node NodeNode Node Node Node

Update

transform

Page 76: Pitfalls of object_oriented_programming_gcap_09

Slide 76

Node

Node Node

Node Node Node NodeNode Node Node Node

Update

transform

Page 77: Pitfalls of object_oriented_programming_gcap_09

Slide 77

Node

Node Node

Node Node Node NodeNode Node Node Node

Update transform and

world bounding sphere

Page 78: Pitfalls of object_oriented_programming_gcap_09

Slide 78

Node

Node Node

Node Node Node NodeNode Node Node Node

Add bounding sphere

of child

Page 79: Pitfalls of object_oriented_programming_gcap_09

Slide 79

Node

Node Node

Node Node Node NodeNode Node Node Node

Update transform and

world bounding sphere

Page 80: Pitfalls of object_oriented_programming_gcap_09

Slide 80

Node

Node Node

Node Node Node NodeNode Node Node Node

Add bounding sphere

of child

Page 81: Pitfalls of object_oriented_programming_gcap_09

Slide 81

Node

Node Node

Node Node Node NodeNode Node Node Node

Update transform and

world bounding sphere

Page 82: Pitfalls of object_oriented_programming_gcap_09

Slide 82

Node

Node Node

Node Node Node NodeNode Node Node Node

Add bounding sphere

of child

Page 83: Pitfalls of object_oriented_programming_gcap_09

Slide 83

• Hierarchical bounding spheres pass info up

• Transforms cascade down

• Data use and code is „striped‟.

– Processing is alternating

Page 84: Pitfalls of object_oriented_programming_gcap_09

Slide 84

• To do this with a „flat‟ hierarchy, break it

into 2 passes

– Update the transforms and bounding

spheres(from top down)

– Expand bounding spheres (bottom up)

Conversion to linear

Page 85: Pitfalls of object_oriented_programming_gcap_09

Slide 85

Transform and BS updates

Node

Node Node

Node Node Node NodeNode Node Node Node

For each node at each level (top down)

{

multiply world transform by parent‟s

transform wbs by world transform

}

:2

:4:4

Page 86: Pitfalls of object_oriented_programming_gcap_09

Slide 86

Update bounding sphere hierarchies

Node

Node Node

Node Node Node NodeNode Node Node Node

For each node at each level (bottom up)

{

add wbs to parent‟s

}

:2

:4:4

For each node at each level (bottom up)

{

add wbs to parent‟s

cull wbs against frustum

}

Page 87: Pitfalls of object_oriented_programming_gcap_09

Slide 87

Update Transform and Bounding Sphere

How many children nodes

to process

Page 88: Pitfalls of object_oriented_programming_gcap_09

Slide 88

Update Transform and Bounding Sphere

For each child, update

transform and bounding

sphere

Page 89: Pitfalls of object_oriented_programming_gcap_09

Slide 89

Update Transform and Bounding Sphere

Note the contiguous arrays

Page 90: Pitfalls of object_oriented_programming_gcap_09

Slide 90

So, what’s happening in the cache?Parent

Children

Parent Data

Childrens‟ Data

Children‟s node

data not needed

Unified L2 Cache

Page 91: Pitfalls of object_oriented_programming_gcap_09

Slide 91

Load parent and its transformParent

Parent Data

Childrens‟ Data

Unified L2 Cache

Page 92: Pitfalls of object_oriented_programming_gcap_09

Slide 92

Load child transform and set world transformParent

Parent Data

Childrens‟ Data

Unified L2 Cache

Page 93: Pitfalls of object_oriented_programming_gcap_09

Slide 93

Load child BS and set WBSParent

Parent Data

Childrens‟ Data

Unified L2 Cache

Page 94: Pitfalls of object_oriented_programming_gcap_09

Slide 94

Load child BS and set WBSParent

Parent Data

Childrens‟ Data

Next child is calculated with

no extra cache misses ! Unified L2 Cache

Page 95: Pitfalls of object_oriented_programming_gcap_09

Slide 95

Load child BS and set WBSParent

Parent Data

Childrens‟ Data

The next 2 children incur 2

cache misses in total Unified L2 Cache

Page 96: Pitfalls of object_oriented_programming_gcap_09

Slide 96

PrefetchingParent

Parent Data

Childrens‟ Data

Because all data is linear, we

can predict what memory will

be needed in ~400 cycles

and prefetch

Unified L2 Cache

Page 97: Pitfalls of object_oriented_programming_gcap_09

Slide 97

• Tuner scans show about 1.7 cache misses

per node.

• But, these misses are much more frequent

– Code/cache miss/cache miss/code

– Less stalling

Page 98: Pitfalls of object_oriented_programming_gcap_09

Slide 98

Performance

19.6 -> 12.9 -> 4.8ms

Page 99: Pitfalls of object_oriented_programming_gcap_09

Slide 99

• Data accesses are now predictable

• Can use prefetch (dcbt) to warm the cache

– Data streams can be tricky

– Many reasons for stream termination

– Easier to just use dcbt blindly • (look ahead x number of iterations)

Prefetching

Page 100: Pitfalls of object_oriented_programming_gcap_09

Slide 100

• Prefetch a predetermined number of

iterations ahead

• Ignore incorrect prefetches

Prefetching example

Page 101: Pitfalls of object_oriented_programming_gcap_09

Slide 101

Performance

19.6 -> 12.9 -> 4.8 -> 3.3ms

Page 102: Pitfalls of object_oriented_programming_gcap_09

Slide 102

• This example makes very heavy use of the cache

• This can affect other threads‟ use of the cache

– Multiple threads with heavy cache use may thrash the cache

A Warning on Prefetching

Page 103: Pitfalls of object_oriented_programming_gcap_09

Slide 103

The old scan~22ms

Page 104: Pitfalls of object_oriented_programming_gcap_09

Slide 104

The new scan

~16.6ms

Page 105: Pitfalls of object_oriented_programming_gcap_09

Slide 105

Up close

~16.6ms

Page 106: Pitfalls of object_oriented_programming_gcap_09

Slide 106

Looking at the code (samples)

Page 107: Pitfalls of object_oriented_programming_gcap_09

Slide 107

Performance countersBranch mispredictions: 2,867 (cf. 47,000)

L2 cache misses: 16,064 (cf 36,000)

Page 108: Pitfalls of object_oriented_programming_gcap_09

Slide 108

• Just reorganising data locations was a win

• Data + code reorganisation= dramatic

improvement.

• + prefetching equals even more WIN.

In Summary

Page 109: Pitfalls of object_oriented_programming_gcap_09

Slide 109

• Be careful not to design yourself into a corner

• Consider data in your design

– Can you decouple data from objects?

– …code from objects?

• Be aware of what the compiler and HW are doing

OO is not necessarily EVIL

Page 110: Pitfalls of object_oriented_programming_gcap_09

Slide 110

• Optimise for data first, then code.

– Memory access is probably going to be your biggest bottleneck

• Simplify systems

– KISS

– Easier to optimise, easier to parallelise

Its all about the memory

Page 111: Pitfalls of object_oriented_programming_gcap_09

Slide 111

• Keep code and data homogenous

– Avoid introducing variations

– Don‟t test for exceptions – sort by them.

• Not everything needs to be an object

– If you must have a pattern, then consider using Managers

Homogeneity

Page 112: Pitfalls of object_oriented_programming_gcap_09

Slide 112

• You are writing a GAME

– You have control over the input data

– Don‟t be afraid to preformat it – drastically if need be.

• Design for specifics, not generics (generally).

Remember

Page 113: Pitfalls of object_oriented_programming_gcap_09

Slide 113

• Better performance

• Better realisation of code optimisations

• Often simpler code

• More parallelisable code

Data Oriented Design Delivers

Page 114: Pitfalls of object_oriented_programming_gcap_09

Slide 114

• Questions?

The END