Fixed point math, texturing and...

46
Fixed point math, texturing and texture caching Michael Doggett Department of Computer Science Lund university

Transcript of Fixed point math, texturing and...

Fixed point math, texturing and texture

cachingMichael Doggett

Department of Computer ScienceLund university

Pixel shader

Vertex shader

Last week’s stage of the Graphics Pipeline

Rasterization

FrameBuffer

Z & Alpha

Edge functions

Vertex positioning

Traversal

Interpolation

Last weekRasterization

© 2009 Tomas Akenine-Möller 4

But first, Assignment 1!• C++ programming, but very localized in

functions where you should add code–C++ should be no problem (if it is, then ask on the

forum)• Only uses simple OpenGL

–should work on any GPU•But requires Windows

© 2009 Tomas Akenine-Möller 5

Assignment 1• Two small parts in this assignment:

–Find three bad things in small scenes• Fix the code so that correct behaviour is obtained

–Use a texture cache• Should be able to reduce texture bandwidth to 10-15%

© 2009 Tomas Akenine-Möller 6

Overview• Theory:

–Fixed-point math (Appendix A – online)–Texturing (Chapter 5 – online)–Texture caching (see assigned papers)

• Caches (Section 5.5 in notes)–For assignment 1, it will help to read chapters 2 and 3

as well (online)

• Practice:–More about the rasterizer framework for assignment 1–More about the actual assignment

© 2009 Tomas Akenine-Möller 7

Fixed-point math• Not floating point... • Good to know!• Essential for hardware design• Can be used for performance optimizations

© 2009 Tomas Akenine-Möller 8

Integer vs fixed-point• An 8-bit integer: b7 b6 b5 b4 b3 b2 b1 b0

• bi is ”worth” 2i as usual–But where is the decimal point?

b7 b6 b5 b4 b3 b2 b1 b0 00000...

• What if we move it to the left?

• bi is still ”worth” 2i : b-1=0.5, b-2=0.25, ...

b5 b4 b3 b2 b1 b0 b-1 b-2 00000...

© 2009 Tomas Akenine-Möller 9

What is fixed?• The decimal point...• A fixed-point number has a representation of [i.f] bits

– i bits for the integer part (with sign, or without)• We assume that two’s-complement is used, i.e., integer math can be

used– f bits for the fractional part

• Look at the fractional bits...

f1 f2 f3 f4

Decimal point

Worth 1/21=0.5Worth 1/22=0.25

1/23=0.125

1/24=0.0625

© 2009 Tomas Akenine-Möller 10

Resolution• f fractional bits ! resolution is 2-f

• Examples:

© 2009 Tomas Akenine-Möller 11

How to maintain the best accuracy?

• The number of bits needed for exact accuracy is increased after each mathematical operation (e.g., addition)–Overflow

• We focus on–Addition/subtraction–Multiplication

• Reason: needed for part of assignment 1

© 2009 Tomas Akenine-Möller 12

Conversion:to fixed and back again

• We have floating point number, a, and want fixed-point, [i.f]

• To fixed:• Nice thing: we now have an integer, and so can use

integer addition, mult etc (but see next slides on that)

• Rounding is implemented:

• If we have fixed-point number, b, we get a float as:

• x 2f and x 2-f are implemented as left and right shifts (fast!)

© 2009 Tomas Akenine-Möller 13

Very simple example:• We have float b=0.25• And want to represent it in fixed-point with 3

fractional bits, i.e., f=3• round(0.25*23)=2• Thus 2 is the fixed-point representation of

0.25 with three fractional bits• Can look at the 8 bits of the integer:

–0000.010 (= 2 if you disregard the decimal point)

© 2009 Tomas Akenine-Möller 14

Addition precision

• Why? Imagine the worse case:–Both numbers hold their maximum number:

• Eg 111.11b +111.11b = 1111.10b

• Result grows by one bit in integer part!

• Number of bits becomes:

=?

Note ”+1”

© 2009 Tomas Akenine-Möller 15

Multiplication precision• More complex. Can be seen as many adds!

–So intuitively, should need more bits to store

• Note, that if you want to maintain exact accuracy, we need to move the ”fixed-point”–Need twice as many fractional bits!

• In general:

• See appendix A for an explanation– Basically, a mult is a series of additions of shifted numbers

© 2009 Tomas Akenine-Möller 16

Fixed-point in practice• In C++ code, you deal with these as int’s

–32 bit signed numbers (but you need not use all of the bits)

• However, you need to prepare the calculations so that bits are not lost

• For edge functions it is of utmost importance to maintain exact values–(after you have rounded off floating-point screen space

coordinates to sub-pixel fixed-point coords)• Example: a and b are [8.2]. If you write:–c=a*b; // then c is [16.4] –d=c>>2; // d is now 16.2 format – // (but we’ve lost 2 LSB – // fractional bits)

© 2009 Tomas Akenine-Möller 17

Fixed-point example•float a=2.75f; •int ai=int(a*(1<<2)+0.5); // [2.2] •// should use floatToFixed()

•float b=2.5f; •int bi=int(b*(1<<1)+0.5); //[2.1] •// how to compute ai+bi? •int ci=ai+(bi<<1); // [3.2] bits

© 2009 Tomas Akenine-Möller 18

End of fixed-point...• In software frame work, a function int floatToFixed(fracBits, float_number) is used.

• When you do a matrix/vector multiply–You often do [16.16]*[16.16] ~=[32.16], or worse

• Remember–Full accuracy needed for edge-functions

• Read appendix A and chapter on Edge Funcs again–Available on course website

Pixel shader

Vertex shader

Last week’s stage of the Graphics Pipeline

Rasterization

FrameBuffer

Z & Alpha

Pixel shader

Vertex shader

Today’s stage of the Graphics Pipeline

Rasterization

FrameBuffer

Z & Alpha

Texture

© 2009 Tomas Akenine-Möller 21

Texturing – the tiny details

• Surprisingly simple technique–Extremely powerful, especially with programmable

shaders–Simplest form: ”glue” images onto surfaces (or

lines, or points)

Image from ”lpics”-paper by Pellacini et al. SIGGRAPH 2005PIXAR Animation Studios

© 2009 Tomas Akenine-Möller 22

Texture space, (s,t)

• Texture resolution, often 2a x 2b texels• The ck are texture coordinates, and belong to a

triangle’s vertices• When rasterizing a triangle, we get (u,v)

interpolation parameters for each pixel (x,y):–Thus the texture coords at (x,y) are:

(u,v)

(s,t)

© 2009 Tomas Akenine-Möller 23

A texture image + coord systems

• An wxh=8x4 texture. –(s,t) are independent of texture resolution–(sw,th) depend on the resolution, and are used to

access texels…• Each pixel in a Texture is called a “Texel”

© 2009 Tomas Akenine-Möller

Texture filtering

MINIFICATION MAGNIFICATION

What do weget here?

© 2008 Tomas Akenine-Möller

• We basically want the sum of the texels in the footprint (dark gray) to the right

© 2009 Tomas Akenine-Möller 25

Texture magnification (1)

• Middle: nearest neighbor – just pick nearest texel

• Right: bilinear filtering: use the four closest texels, and weight them according to actual sampling point

© 2009 Tomas Akenine-Möller 26

• Bilinear filtering is simply, linear filtering in x:

Texture magnification (2)

t00 t10

t11t01

• Followed by linear filtering in y:

© 2009 Tomas Akenine-Möller 27

Texture minification• If nearest neighbor or bilinear filtering is used,

then serious flickering will result–Extremely annoying

Nearestneighbor

Trilinearmipmapping

For a pixel here, there is a 50%chance of getting a black texel

© 2009 Tomas Akenine-Möller 28

Texture minification: mipmapping

An image pyramidof low-passfiltered images

© 2009 Tomas Akenine-Möller

© 2009 Tomas Akenine-Möller 29

Trilinear Mipmapping (1)

• Basic idea:–Approximate (dark gray footprint) with square–Then we can use texels in mipmap pyramid

© 2009 Tomas Akenine-Möller 30

Trilinear mipmapping (2)

• Compute d (LOD) (see Chapter 5), and then use two closest mipmap levels– In example above, level 1 & 2

• Bilinear filtering in each level, and then linear blend between these colors ! trilinear interpolation

• Nice bonus: makes for much better texture cache usage

demo1.exe

Text

© mmxiii mcd

TextureCaching

© 2009 Tomas Akenine-Möller 32

Texture caching• Without a cache, we can get ridiculously expensive

texturing... • Basic idea is: just use a cache for recently accessed

texels–Since we access coherently, hit rate should be quite

high!– In hardware, a cache can be:

• A small SRAM memory, or• A set of flipflops• We assume that an access in the cache is for ”free”

• In the assignment, texture filtering (eg mipmapping) is done for you.–You should experiment with caching parameters!

© 2009 Tomas Akenine-Möller 33

Assumptions: memory architecture• Accesses to external memory are expensive

–Both in time and from energy perspective–Bursting (i.e., send a sequence of continuous

words) is often (much) cheaper• E.g., fetching 8x 32-bit words (32 bytes) in a sequence is much faster than fetching 8x 32-bit words that are in random places...

Externalmemory

FragmentGeneration

Triangles

Textured fragments

PIPE

LIN

E

Texture cache

© 2009 Tomas Akenine-Möller 34

Texture cache readings• A nice introduction :

– My “Texture Caches” paper from IEEE Micro 2012• Also :

– ”The Design and Analysis of a Cache Architecture for Texture Mapping”, by Hakura and Gupta, in ISCA 97.

– ”Prefetching in a Texture Cache Architecture”, by Igehy et al, in Graphics Hardware 1998.

• Note that these are old papers, and cache sizes etc don’t apply to modern systems...• The general results still apply though

• GPU Example: NEON architecture (1998)– Built by Digital Equipment Corporation (bought by Compaq (bought by HP))– Has 256 bytes of cache, fully associative– Split into 8 different small caches

• So 8 texels can be fetched every clock cycle– Cache line size is 32 bits

• This is very small. The optimal size depends on what type of external memory you have

• More about GPU memory architecture in a later lecture

© 2009 Tomas Akenine-Möller 35

How to get good efficiency• Three important things [Hakura & Gupta]:

–How texels in texture are ordered in memory–Rasterization algorithm–Cache parameters

• Associativity–Number of cache lines = sets X ways–n - way associate cache : means n blocks(lines) in each set

• Cache line size• Cache size

© 2009 Tomas Akenine-Möller 36

Representation of textures in mem• Normally, a 4x4 texture is stored as:

–RGBA0, RGBA1, RGBA2, ... RGBA15 • What if, we traverse in the vertical

direction?–E.g., accessing 1,5,9,13–Quite bad if we read, say, 4 texels into

the cache at a time• Are better texel orderings possible?• With representation to the right, only

two blocks are read into the cache

1514131211109876543210

1514111013129876325410

• This representation will (on average) get the same performance regardless of traversal direction!!!

© 2009 Tomas Akenine-Möller 37

• This is called a ”blocked” or ”tiled” representation - “z-order”

• It is a 4D structure: first find 2x2 block, then texel in block

Representation of textures...

1514111013129876325410

• In general, we have an nxn block... –n is power of 2

• Mipmap levels can thrash at exactly the same location in a direct mapped cache

• Solution:–Use a fully associative cache–Hakura & Gupta shows that a 2-way associative cache

gives similar results–Or simpler, ”bake” the mipmap level into the computation of

the ”cache key” (tag)

© 2009 Tomas Akenine-Möller 38

Texture cache recommendations

• Tile (block) size in texture should be equal to cache line size

• Can even extend to 6D addressing–Another level, where each block is the size of the entire

cache... • Further minimizes conflict misses

–Also, Igehy et al use two separate direct-mapped caches:• One for odd mipmap levels, and one for even• Is enough to get good results• Again, one direct-mapped cache would work if the cache key

(tag) take mipmap level into account (but having two caches gives more bandwith from the caches)

© 2009 Tomas Akenine-Möller 39

Traversal algorithm• Traversal algorithm affects the order in which

texels are accessed !–Also influences texture caching...

• With scanline-based traversal, we do not get any positive effects for pixels below current scanline–This is assuming a small cache–Positive effects should be possible, due to bilinear

filtering (used in mipmapping and magnification)

• Tiled traversal performs better!–Especially for large triangles

© 2009 Tomas Akenine-Möller 40

Why is mipmapping good for texture caching?• We choose mipmap levels to

access where footprint becomes ~1 texel

• Therefore, traversal moves slowly in texture space ! many cache hits!

• Better than nearest neighbor (minification)

© 2009 Tomas Akenine-Möller 41

Back to the assignment...The coding framework (1)

• Implements a subset of OpenGL–(mostly focused on the rasterizer)

• Designed so –that is, it is built around units that exist in real

hardware• Programmability

–We have fragment shaders as well–Though, focus is not on using them right now...

© 2009 Tomas Akenine-Möller 42

The coding framework (2)• Uses Microsoft Visual Studio 2008

–But upgrades to work with 2010/12/13/15• Nice feature for this assignment:

–Press the R key, and you can toggle rasterizer–You can switch from

• our software rasterizer• to the OpenGL hardware rasterizer

© 2009 Tomas Akenine-Möller 43

Actual assignment (1)• Two tasks..• Task 1:

–Switch between the software rasterizer and hardware OpenGL rasterizer (press ’R’)

–Use this to detect the ”artifacts”• Three artifacts: need to be corrected so that results are ”very near” identical to hardware OpenGL

• How could I know how to correct the artifacts?• Read the literature that we recommend!

–Everything is very localized in the source:• Change in cRasterizer.* + cEdgeFunc.*

© 2009 Tomas Akenine-Möller 44

Actual assignment (2)• Task 1: Fix pixel errors.

• Task 2:–Time to reduce texture bandwidth–In glstate.cpp, add a texture cache...–Should be able to reduce texture bandwidth to at

most 10-15%...–You need to experiment quite a bit to get this kind

of performance...

© 2009 Tomas Akenine-Möller 45

cRasterizeror

cTileRasterizer

More about the software framework

setup()

rasterizeTriangle()[i.e., triangle traversal]

For each pixelinside triangle

perFragment() Compute depth cDepthUnit – depth testPerspectiveinterpolation

Fragment shader

cTextureUnit – texel fetch & filtering

Write color cColorUnit

© mmxvi mcd

Next• Don’t forget to read the literature!

• Text has full background to the slides

• Very valuable for assignments too

• Labs

• Find a partner (ask on the forum)

• Sign up

• Check the web page for info http://cs.lth.se/edan35

• Ask questions on the forum

• Next Lecture :

• Shader programming