Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments....

107
Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserve T TO Technical Training Organization

Transcript of Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments....

Page 1: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Internal Memory & Cache

Chapter 15

C6000 Integration Workshop

Copyright © 2005 Texas Instruments. All rights reserved.

T TOTechnical Training

Organization

Page 2: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 3: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Parking Choices: 0 minute walk @ $100 for close-in parking

Parking Dilemma

Sports

Arena

Close Parking 0 minute walk 10 spaces $100/space

Distant Parking-Ramp

10 minute walk

10 minute walk 1000 spaces $5/space

Parking Choices: 0 minute walk @ $100 for close-in parking 10 minute walk @ $5 for distant parking

Parking Choices: 0 minute walk @ $100 for close-in parking 10 minute walk @ $5 for distant parking

or …

Page 4: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Parking Dilemma

Sports

Arena

Distant Parking-Ramp

10 minute walk

Close Parking 0 minute walk 10 spaces $100/space

10 minute walk 1000 spaces $5/space

Parking Choices: 0 minute walk @ $100 for close-in parking 10 minute walk @ $5 for distant parking

or … Valet parking: 0 minute walk @ only $6.00

How does this compare to cache memory?T TOTechnical Training

Organization

Page 5: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Why Cache?

Sports

Arena

BulkMemory

FastMemory

Memory Choices: Small, fast memory Large, slow memory

Memory Choices: Small, fast memory Large, slow memory

or …

Slower Larger Cheaper

Fast Small

Page 6: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Why Cache?

Sports

Arena

BulkMemory

CacheMemory

Slower Larger Cheaper

Memory Choices: Small, fast memory Large, slow memory

or … Use Cache: Combines advantages of both Like valet, data movement is automatic

Fast Small Works

like Big, Fast Memory

T TOTechnical Training

Organization

Page 7: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 8: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

CPU

InternalRAM

EDMA

Using Internal RAMExternalMemory

EMIF

Programfunc1func2func3

0x8000

Before executing functions (e.g. func1) they must be transferred to Internal Memory

The Programmer has to set this up If all functions can’t fit at once, it becomes more

complicated (i.e. overlays)

T TOTechnical Training

Organization

Page 9: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

CPU

Cache

Cache H/W

Using Cache MemoryExternalMemory

EMIF

Programfunc1func2func3

0x8000

Cache hardware automatically transfers code/data to internal memory, as needed

Addresses in the Memory Map are associated with locations in cache

Cache locations do not have their own addresses

Let’s start with Basic Concepts of a Cache …T TOTechnical Training

Organization

Page 10: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Cache: Block, Line, IndexExternalMemory

0x8000

Conceptually, a cache divides the entire memory into blocks equal to its size

0x8010

0x8020

Block

Cache

0

...

0xF

Page 11: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Cache: Block, Line, IndexExternalMemory

0x8000

Conceptually, a cache divides the entire memory into blocks equal to its size

A cache is divided into smaller storage locations called lines

0x8010

0x8020

Block

Cache

0

...

0xF

CacheLine

Page 12: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Cache: Block, Line, IndexExternalMemory

0x8000

Conceptually, a cache divides the entire memory into blocks equal to its size

A cache is divided into smaller storage locations called lines

The term Index or Line-Number is used to specify a specific cache line

0x8010

0x8020

Cache

0

...

0xF

Block

CacheLine

Index

How do we know which block is cached?T TOTechnical Training

Organization

Page 13: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Cache TagsExternalMemory

0x8000

A Tag value keeps track of which block is associated with a cache line

0x8010

0x8020

Tag Index Cache800 0

...

0xF

What could be better than one tag per block?

Page 14: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Cache TagsExternalMemory

0x8000

A Tag value keeps track of which block is associated with a cache block

Each line has it’s own tag -- thus, the whole cache block won’t be erased when lines from different memory blocks need to be cached simultaneously

0x8010

0x8020

Tag Index Cache800 0

801 1...

0xF

How do we know a cache line is valid (or not)?T TOTechnical Training

Organization

Page 15: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Valid BitsExternalMemory

0x8000

A Valid bit keeps track of which lines contain “real” information

They are set by the cache hardware whenever new code or data is stored

0x8010

0x8020

Tag Index Cache800 0

801 1...

721 0xF

Valid1

1...0

0

This type of cache is called ...T TOTechnical Training

Organization

Page 16: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped CacheExternalMemory

0x8000

0x8010

0x8020

Index Cache

0

...

0xF

Block

Direct-Mapped Cache associates an address within each block with one cache line

Thus … there will be only one unique cache index for any address in the memory-map

Only one block can have information in a cache line at any given time

Let's look at an example ...T TOTechnical Training

Organization

Page 17: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 18: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache ExampleExternalMemory

0x8000

0x8010

0x8020

0x8030

IndexTag3 015 4

Tag Index Cache0

1...

E

0xF

Let’s examine an arbitrary direct-mapped cache example: A 16-line, direct-mapped cache requires

a 4-bit index

If our example P used 16-bit addresses, this leaves us with a 12-bit tag

Valid

T TOTechnical Training

Organization

Page 19: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Arbitrary Direct-Mapped Cache Example

The following example uses: 16-line cache 16-bit addresses, and Stores one 32-bit instruction per line

C6000 cache’s have different cache and line sizes than this example

It is only intended as a simple cache example to reinforce cache concepts

T TOTechnical Training

Organization

Page 20: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Conceptual Example Code

0003h0004h0005h0006h

0026h0027h0028h

L1 LDHMPYADDB L2

L2 ADDSUB cnt

[!cnt] B L1

CodeAddress

Tag Index15 4 3 0

T TOTechnical Training

Organization

Page 21: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

CacheTagValid

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 22: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

CacheTagValid

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 23: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

CacheTagValid

Compulsory Miss Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 24: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

LDH

Cache

000

Tag

Valid

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 25: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

LDHMPY

Cache

000000

Tag

Valid

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 26: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

Cache

000000000

Tag

Valid

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 27: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B

Cache

000000000000

Tag

Valid

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 28: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B ADD

Cache

000000000

000 002

Tag

Valid

Conflict Miss

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 29: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B ADDSUB

Cache

000000000

000 002002

Tag

Valid

Address Code 0003h L1 LDH 0004h MPY 0005h ADD 0006h B L2

0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 30: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B ADDSUB

B

Cache

000000000

000 002002002

Tag

Valid

Address Code 0003h L1 LDH ... 0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 31: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B ADDSUB

B

Cache

000000000

000 002002002

Tag

Valid

Address Code 0003h L1 LDH ... 0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 32: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B ADDSUB

B

Cache

000000000

000 002002002

Tag

Valid

Address Code 0003h L1 LDH ... 0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 33: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B ADDSUB

B

Cache

000000000

000 002002002

Tag

Valid

Address Code 0003h L1 LDH ... 0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 34: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

B ADD BSUB

B

Cache

000000000

000 002 000002002

Tag

Valid

Address Code 0003h L1 LDH ... 0026h L2 ADD 0027h SUB cnt 0028h [!cnt] B L1

Page 35: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Direct-Mapped Cache Example

Index0123456789A..

F

LDHMPYADD

BADDSUB

B

Cache

000000000000002002002

Tag

Valid

Notes: This example was contrived to show how

cache lines can thrash Code thrashing is minimized on the

C6000 due to relatively large cache sizes Keeping code in contiguous sections

also helps to minimize thrashing Let’s review the two types of misses that

we encountered

Notes: This example was contrived to show how

cache lines can thrash Code thrashing is minimized on the

C6000 due to relatively large cache sizes Keeping code in contiguous sections

also helps to minimize thrashing Let’s review the two types of misses that

we encounteredT TOTechnical Training

Organization

Page 36: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Types of Misses

How can we visualize these misses?

Compulsory Miss when first accessing an new address

Conflict Line is evicted upon access of an address whose

index is already cached Solutions:

Change memory layout Allow more lines for each index

Capacity (we didn’t see this in our example) Line is evicted before it can be re-used because

capacity of the cache is exhausted Solution: Increase cache size

T TOTechnical Training

Organization

Page 37: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

CacheTune

Mem

ory

Lo

cati

on

s

Time (# of instructions executed)

Cache HitHit/MissCache Miss

Cache HitHit/MissCache Miss

Page 38: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory

L1 Program Cache L1 Data Cache L2 Memory

‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 39: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘1x Internal Memory

InternalRAM or Cache

(L2)CPU

ProgramCache(L1P)

DataCache(L1D)

EMIF

L1Level 2

Level 3

We often refer to a system’s memory in hierarchical levels

Higher levels (L1) are closer to the CPU

CPU always requests from highest level memory …

… If address isn’t present in L1, cache h/w gets it from lower level

Looking more closely at L1P …T TOTechnical Training

Organization

Page 40: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

L1P CacheExternalMemory

CPU

ProgramCache (L1P)

L2 EMIF

for( i = 0; i < 10; i++ ) { sum += x[i] * y[i];}

for( i = 0; i < 10; i++ ) { sum += x[i] * y[i];}

Cache is always on Direct-Mapped Cache

Works exceptionally well for DSP code (which tends to have many loops)

Can be placed to minimize thrashing

How big is the cache?

Page 41: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

L1P CacheExternalMemory

CPU

ProgramCache (L1P)

L2 EMIF

Cache is always on Direct-Mapped Cache

Works exceptionally well for DSP code (which tends to have many loops)

Can be placed to minimize thrashing The cache is 4K bytes Each line stores 16 instructions (Linesize = 16)

for( i = 0; i < 10; i++ ) { sum += x[i] * y[i];}

for( i = 0; i < 10; i++ ) { sum += x[i] * y[i];}

4KB

What does Linesize mean?T TOTechnical Training

Organization

Page 42: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

New Term: LinesizeExternalMemory

0x8000

0x8010

0x8020

Block

Cache

0

...

0xF

In our earlier cache example, the size was: Size: 16 bytes Linesize: 1 byte # Of index’s: 16

How else could it be configured?

Page 43: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

New Term: LinesizeExternalMemory

0x8000

0x8010

0x8020

Block

Index Cache

0 0 1...

0x7 0xE 0xF

In our earlier cache example, the size was: Size: 16 bytes Linesize: 1 byte # Of index’s: 16

We have now changed it to: Size: 16 bytes Linesize: 2 bytes # Of index’s: 8

What’s the advantage of greater line size?Speed! When cache retrieves one item, it

gets another at the same time.T TO

Technical Training Organization

Page 44: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory

L1 Program Cache L1 Data Cache L2 Memory

‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 45: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Caching Data

for( i = 0; i < 4; i++ ) { sum += x[i] * y[i];}

One instruction may access multiple data elements:

Tag Data Cache

0

4K

What would happen if x and y ended up at the following addresses?

x = 0x8000y = 0x9000

ExternalMemory

x

y

They would end up overwriting each other in the cache --- called thrashing

Increasing the associativity of the cache will reduce this problem

How do you increase associativity?T TOTechnical Training

Organization

Page 46: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Increased AssociativityExternalMemory

0x08000

0x10800

0x11000

0x11800

Valid Tag Data Cache0

2K

0

2K

Way 0

Way 1

Split a Direct-Mapped Cache in half Each half is called a cache way Multiple ways make data caches more efficient

Page 47: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Increased AssociativityExternalMemory

0x08000

0x10800

0x11000

0x11800

Valid Tag Data Cache0

2K

0

2K

Way 0

Way 1

Split a Direct-Mapped Cache in half Each half is called a cache way Multiple ways makes data caches more efficient

C671x/C621x L1D dimensions: 4K Bytes 2 Ways 32 Byte linesize

T TOTechnical Training

Organization

Page 48: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

What is a Set?ExternalMemory

0x8000

The lines from each way that map to the same index form a set

0x8008

0x8010

0x8018

Data Cache

0

0

Set of index zero’s,

i.e. Set 0

The number of lines per set defines the cache as an N-way set-associative cache

Set 1

Now, how many unique locations are there for each memory address?T TOTechnical Training

Organization

Page 49: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

What is a Set?ExternalMemory

0x8000

The lines from each way that map to the same index form a set

0x8008

0x8010

0x8018

Data Cache

0

0

Set of index zero’s,

i.e. Set 0

The number of lines per set defines the cache as an N-way set-associative cache

With 2 ways, there are now 2 unique cache locations for each memory address

Set 1

T TOTechnical Training

Organization

Page 50: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory

L1 Program Cache L1 Data Cache L2 Memory

‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 51: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

'11 / '12 Internal Memory Level 1 Caches

Single-cycle access Always enabled L2 accessed on miss

Level 2 Memory Unified: Prog or Data

L2 L1D delivers32-bytes in 4 cycles

L2 L1P delivers 16 instr’s in 5 cycles

Configure L2 as cache or addressable RAM

CPU

L1Program

(4KB)

L1Data(4KB)

L2

Program& Data

(64K Bytes)8/16/32/64

If cache is so great, why is L2 configurable?

(C6713: L2 memory is 256K bytes)

T TOTechnical Training

Organization

Page 52: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

If L2 didn’t have addressable RAM? Requires external storage of peripheral data Both EDMA and CPU must tie up EMIF to

store and retrieve data

Cache

ExternalMemory

EnhancedDMA

(EDMA)EMIF

Peripheral Port

T TOTechnical Training

Organization

Page 53: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

C6000 Level 2 - Flexible & Efficient

Mappedas RAM

Cache

ExternalMemory EMIF

Peripheral Port

EnhancedDMA

(EDMA)

Configure L2 as cache and/or mapped-RAM Allows peripheral data or critical code and

data storage on-chip

T TOTechnical Training

Organization

Page 54: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

L2 Memory Configuration

Four 16KB blocks – Configure each as cache or addressable RAM Each additional cache block provides another cache way L2 is unified memory – can hold program or data

C6713 Still has 4 configurable 16KB RAM/cache blocks, the remaining 192KB is always RAM

Hardwaredefault

or or

RAM 3

RAM 0

RAM 1

RAM 2oror

dsk6711.cdbtemplate default

RAM 1

RAM 0

RAM 2

Way 1

RAM 1

RAM 0

Way 2

Way 1

Way 3

RAM 0

Way 2

Way 1

Way 3

Way 4

Way 2

Way 1

Page 55: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Configuring L2 Cache with CDB

T TOTechnical Training

Organization

Page 56: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory

L1 Program Cache L1 Data Cache L2 Memory

‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 57: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

C64x Internal Memory

CPU

L1Program

(16KB)

L1Data

(16KB)

L2

Program& Data

(1M Bytes)8/16/32/64

L1 Program Cache Direct Mapped (1 way) Single cycle access Size = 16K

Bytes Linesize = 8

instr.

L1 Data Cache 2-Way Cache Single cycle access Size = 16K

Bytes Linesize = 64

bytes

Level 2 Memory C6414/15/16 = 1M Byte

C6411/DM642 = 256K Byte

T TOTechnical Training

Organization

Page 58: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

C64x L2 Memory Configuration

When cache is enabled, it’s always 4-Way

This differs from C671x

Linesize Linesize = 128

bytes Same linesize as C671x

Performance L2 L1P

1-8 Cycles L2 L1D

L2 SRAM hit: 6 cycles

L2 Cache hit: 8 cycles

Pipelined: 2 cycles

L2 Ways are Configurable in Size

L2 Ways are Configurable in Size

0 32K 64K 128K 256K

T TOTechnical Training

Organization

Page 59: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics

Data Mem Bank Cache Optimization Cache Coherency MAR Registers

Using the C Optimizer Lab 15T TO

Technical Training Organization

Page 60: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘C641x L1D Memory Banks

512x32

01234567

512x32

01234567

512x32

01234567

512x32

01234567

512x32

01234567

512x32

01234567

512x32

01234567

512x32

01234567

#pragma DATA_MEM_BANK(a, 0);

#pragma DATA_MEM_BANK(x, 4);

Only one access allowed per bank per cycle

Use DATA_MEM_BANK to make sure that arrays thatwill be accessed in parallel start in different banks

T TOTechnical Training

Organization

Page 61: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

DATA_MEM_BANK(var, 0 or 2 or 4 or 6)#pragma DATA_MEM_BANK(a, 0);

short a[256] = {1, 2, 3, …

#pragma DATA_MEM_BANK(x, 4);

short x[256] = {256, 255, 254, …

#pragma UNROLL(2);

#pragma MUST_ITERATE(10, 100, 2);

for(i = 0; i < count ; i++) {

sum += a[i] * x[i];

}

An internal memory specialized Data Align

Optimizes variable placement to account for the wayinternal memory is organized

T TOTechnical Training

Organization

Page 62: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview

Data Mem Bank Cache Optimization Cache Coherency MAR Registers

Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 63: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Cache Optimization Optimize for Level 1 Multiple Ways and wider lines maximize

efficiency – we did this for you! Main Goal - maximize line reuse before eviction

Algorithms can be optimized for cache

“Touch Loops” can help with compulsory misses

Up to 4 write misses can happen sequentially, but the next read or write will stall

Be smart about data output by one function then read by another (touch it first)

T TOTechnical Training

Organization

Page 64: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Updated Cache Documentation

Cache Reference Guides for C621x/C671x (SPRU609) and C64x (SPRU610) Replaces “Two-Level Internal Memory” chapter in

Peripherals Reference Guide More comprehensive description of C6000 cache Revised terminology for cache coherence

operations Cache User’s Guide for C6000 (SPRU656)

Cache Basics Using C6000 Cache Optimization for Cache Performance

T TOTechnical Training

Organization

Page 65: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics

Data Mem Bank Cache Optimization Cache Coherency MAR Registers

Using the C Optimizer Lab 15T TO

Technical Training Organization

Page 66: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Coherency Example: Description

CPU

L2External

L1D

For this example, L2 is set up as cache Example’s Data Flow:

EDMA fills RcvBuf CPU reads RcvBuf, processes data, and writes to XmtBuf EDMA moves data from XmtBuf (e.g. to a D/A converter)

RcvBuf

XmtBuf

EDMA

EDMAC

ache

T TOTechnical Training

Organization

Page 67: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

EDMA Writes Buffer

CPU

L2External

L1D

RcvBufEDMA

Buffer (in external memory) written by the EDMA

T TOTechnical Training

Organization

Page 68: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

CPU Reading Buffers

CPU

L2External

L1D

RcvBufEDMA

CPU reads the buffer for processing This read causes a cache miss in L1D and L2

Page 69: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

CPU Reading Buffers

CPU

L2External

L1D

RcvBufRcvBuf RcvBufEDMA

CPU reads the buffer for processing This read causes a cache miss in L1D and L2 RcvBuf is added to both caches

Space is allocated in each cache RcvBuf data is copied to both caches

T TOTechnical Training

Organization

Page 70: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Where Does EDMA Read From?

CPU

L2External

L1D

RcvBufRcvBuf RcvBufEDMA

The EDMA reads from the external memory addresswhere XmtBuf is located

XmtBuf

EDMA

Will the EDMA get the right data?Where does the new data reside?T TO

Technical Training Organization

Page 71: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Where Does the CPU Write To?

CPU

L2External

L1D

RcvBufRcvBuf RcvBuf

XmtBuf

EDMA

After processing, the CPU writes to XmtBuf Write misses to L1D are written directly to the

next level of memory (L2) Thus, the write does not go directly to external memory

XmtBuf

EDMA

Page 72: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Where Does the CPU Write To?

CPU

L2External

L1D

RcvBufRcvBuf RcvBuf

XmtBuf

EDMA

After processing, the CPU writes to XmtBuf Write misses to L1D are written directly to the

next level of memory (L2) Thus, the write does not go directly to external memory Cache line Allocated: L1D on Read only

L2 on Read or Write

XmtBuf

EDMA

T TOTechnical Training

Organization

Page 73: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

A Coherency Issue

CPU

L2External

L1D

RcvBuf

XmtBuf

RcvBuf RcvBuf

XmtBuf

EDMA

EDMA

EDMA is set up to transfer the buffer from ext. mem The buffer resides in cache, not in ext. memory So, the EDMA transfers whatever is in ext. memory,

probably not what you wanted

There are three solutions to data coherency ...T TOTechnical Training

Organization

Page 74: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Solution 1: Flush & Clear the Cache

CPU

L2External

L1D

RcvBuf

XmtBuf

RcvBuf RcvBuf

XmtBuf

EDMA

EDMA

When the CPU is finished with the data (and has written it to XmtBuf in L2), it can be sent to ext. memory with a cache writeback

A writeback is a copy operation from cache to memory

CSL (Chip Support Library) provides an API for writeback:CACHE_wbL2((void *)XmtBuf, bytecount, CACHE_WAIT);

writeback

Before looking at the next solution, there’s one other coherency issue ...T TO

Technical Training Organization

Page 75: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Another Coherency Issue

CPU

L2External

L1D

RcvBuf

XmtBuf

RcvBuf RcvBuf

XmtBuf

EDMA

EDMA writes a new RcvBuf buffer to ext. memory When the CPU reads RcvBuf a cache hit occurs

since the buffer (with old data) is still valid in cache Thus, the CPU reads the old data instead of the new

T TOTechnical Training

Organization

Page 76: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Another Coherency Solution

CPU

L2External

L1D

RcvBuf

XmtBuf

RcvBuf RcvBuf

XmtBuf

EDMA

To get the new data, you must first invalidate the old data before trying to read the new data (clears cache line’s valid bits)

CSL provides an API to writeback with invalidate: It writes back modified (i.e. dirty) data, Then invalidates cache lines containing the buffer

CACHE_wbInvL2((void *)RcvBuf, bytecount, CACHE_WAIT);

A quick summary of L2 writeback/invalidate functions …T TO

Technical Training Organization

Page 77: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

L2 Cache Coherence Operations

For block operations, only the lines in L1D or L1P with addresses corresponding to the addresses of L2 operations are affected

Careful: Cache always invalidates/writes back whole lines. To avoid unexpected coherence problems: align buffers at a boundary equal to the cache line size and make the size of the buffers a multiple of the cache line size

Type / Scope L2 CSL Function L2 Cache Operation Affect on

L1 Caches

InvalidateBlock

CACHE_invL2 (ext memory base addr,byte count,wait)

• Lines invalidated • Corresponding lines invalidated in L1D & L1P

• Any L1D updates discarded

WritebackBlock

CACHE_wbL2 (ext memory base addr,byte count,wait)

• Dirty lines written back• Lines remain valid

• L1D: Updated data written back, then corresponding lines invalidated

• L1P: No affectWriteback

with Invalidate

Block

CACHE_wbInvL2 (ext memory base addr,byte count,wait)

• Dirty lines written back• Lines invalidated

• L1D: Updated data written back, then corresponding lines invalidated

• L1P: corr. lines invalidatedWriteback

AllCACHE_wbAllL2 (wait) • Updated lines written

back• All lines remain valid

• L1D: Updated data written back, then all lines invalidated

• L1P: No affectWriteback

with Invalidate

All

CACHE_wbInvAllL2 (wait) • Updated lines written back

• All lines invalidated

• L1D: Updated data written back, then all lines invalidated

• L1P: All lines invalidated

Page 78: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

When to Use Coherency Functions? Use When CPU and EDMA share a cacheable region in

external memory Safest: Use L2 Writeback-Invalidate All before any EDMA

transfer to/from external memory. Disadvantage: Larger Overhead

Reduce overhead by: Only operating on buffers used for EDMA, and Distinguishing between three possible scenarios:

1. EDMA reads data written by the CPU Writeback before EDMA

2. EDMA writes data to be read by the CPU Invalidate before EDMA*

3. EDMA modifies data written by the CPU that is to be read back by the CPU

Writeback-Invalidate before EDMA

* For C6211/6711 use Writeback-Invalidate before EDMAT TOTechnical Training

Organization

Page 79: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Solution 2: Keep Buffers in L2

CPU

L2External

L1D

RcvBuf RcvBuf

XmtBuf

EDMA

Configure some of L2 as RAM Locate buffers in this RAM space Coherency issues do not exist between L1 and L2

EDMA

T TOTechnical Training

Organization

Page 80: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics

Data Mem Bank Cache Optimization Cache Coherency MAR Registers

Using the C Optimizer Lab 15T TO

Technical Training Organization

Page 81: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

"Turn Off" the Cache (MAR)

CPU

L2External

L1D

RcvBuf

XmtBuf

EDMA

The Memory Attribute Registers (MARs) enable/disablecaching for a memory range

Turning off the cache can solve coherency issues, but Without cache, though, access to memory is slow

Looking more closely at the MAR registers ...T TOTechnical Training

Organization

Page 82: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Memory Attribute Regs (MAR)

CE0

CE2

CE3

0 = Not cached1 = Cached

MAR4 0

MAR5 1

MAR6 1

MAR7 1

Reserved

Use MAR registers to enable/disable caching of external ranges

Useful when external data is modified outside the scope of the CPU

You can specify MAR values in Config Tool

C671x: 16 MAR’s 4 per CE space Each handles 16MB

C64x: Each handles 16MB 256 MAR’s 16 per CE space

(on current C64x, some are rsvd)

Setting MAR's in CDB files ...T TOTechnical Training

Organization

Page 83: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Setting MARs in CDB (C67x)

MAR bit values:0 = Not cached1 = Cached

MAR bit values:0 = Not cached1 = Cached

MAR0 00000001

MAR1 00000000

MAR2 00000000

MAR3 00000000

MAR15 00000000 Finally, for more info ...T TOTechnical Training

Organization

Page 84: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Setting MARs in CDB (C64x)

MAR bit values:0 = Not cached1 = Cached

MAR bit values:0 = Not cached1 = CachedT TO

Technical Training Organization

Page 85: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15

T TOTechnical Training

Organization

Page 86: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Compiler Build Options Nearly one-hundred compiler options available to tune your

code's performance, size, etc. Following table lists most commonly used options:

Options Description

-mv6700 Generate ‘C67x code (‘C62x is default)

-mv67p Generate ‘C672x code

-mv6400  Generate 'C64x code

-mv6400+ Generate 'C64x+ code

-fr <dir> Directory for object/output files

-fs <dir> Directory for assembly files

Debug-g Enables src-level symbolic debugging

-ss Interlist C statements into assembly listing

Optimize (release)

-o3 Invoke optimizer (-o0, -o1, -o2/-o, -o3)

-k Keep asm files, but don't interlist

To make options easier, TI recommends using …T TOTechnical Training

Organization

Page 87: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

-g -fr“$(Proj_dir)\Debug" -d"_DEBUG" -mv6700

-o3 -k -fr“$(Proj_dir)\Release" -mv6700

Default Build Configurations

For new projects, CCS automatically creates two build configurations: Debug

(unoptimized) Release

(optimized)

Use the drop-down to quickly select build config.

Add/Remove build config's with Project Configurations dialog (on project menus)

Edit a configuration:1. Set it active2. Modify build options

(shown next)3. Save project

For more help …T TOTechnical Training

Organization

Page 88: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Optimizing C Performance Compiler Tutorial (in CCS Help & SPRU425a.pdf)

C6000 Programmer’s Guide (SPRU198)

Chapter 4: “Optimizing C Code”

C6000 Optimizing C Compiler UG (SPRU187)T TO

Technical Training Organization

Page 89: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Lab 15/15A

Move buffers off-chip Turn on L2 cache Investigate MAR bits Solve coherency issues with

writeback/invalidate Use cache debug techniques

LAB 15

Use Release Configuration Benchmark performance and

code size

LAB 15A

T TOTechnical Training

Organization

Page 90: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15 Optional Topics

‘0x Memory Summary ‘0x Data Memory Optimization

T TOTechnical Training

Organization

Page 91: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Lab 15 – Internal Mem. and Cache

Lab 15a – Find Memory Bank Conflicts

C64x Users Only

Lab 15b – Compare the performance:

1. Code/Data Off-Chip with No Cache

2. Code/Data Off-Chip with Cache

3. Code/Data On-Chip

Lab 15c – Exploring CacheTune

T TOTechnical Training

Organization

Page 92: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Outline Why Cache? Cache Basics Cache Example (Direct-Mapped) C6211/C671x Internal Memory ‘C64x Internal Memory Overview Additional Memory/Cache Topics Using the C Optimizer Lab 15 Optional Topics

‘0x Memory Summary & Optimization

T TOTechnical Training

Organization

Page 93: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘0x Internal Memory

Internal Data Internal Data128K bytes

Internal Data512K bytes

Cache / RAMCache / RAM128K bytes

RAM128K bytes

Cache / RAM128K bytes

RAM256K bytes‘C6x01/04/05

1M bit Total

‘C62023M bit Total

‘C62037M bit Total

Program

Data

Only program cache (no data cache) Configurable as Cache or RAM Single-cycle access

T TOTechnical Training

Organization

Page 94: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘C6201 Internal Data Split into 4 banks Dual access to two banks in 1 cycle Dual accesses to one bank results

in one cycle delay

8Kx16

16

A D

8Kx16

16

A D

8Kx16

16

A D

8Kx16

16

A D

T TOTechnical Training

Organization

Page 95: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘C6201 Internal Data

Banks are interleaved

8Kx16

16

A D

8Kx16

16

A D

8Kx16

16

A D

8Kx16

16

A D

0 1 6 72 3 4 58 9 E FA B C D

Byte Ordering

T TOTechnical Training

Organization

Page 96: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘C6201 Internal Data

Banks are interleaved How many cycles would these two LDW accesses

take?

...

16

A D

...

16

A D

16

A D

...

16

A D

0 1 6 72 3 4 58 9 E FA B C D

8 9 E FA B C DT TOTechnical Training

Organization

Page 97: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘C6201 Internal Data

Banks are interleaved How many cycles would these two LDW accesses

take? 1

...

16

A D

...

16

A D

16

A D

...

16

A D

0 1 6 72 3 4 58 9 E FA B C D

8 9 E FA B C DT TOTechnical Training

Organization

Page 98: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘C6201 Internal Data

Now, how many cycles would it take for these two LDW’s?

...

16

A D

...

16

A D

16

A D

...

16

A D

0 1 6 72 3 4 58 9 E FA B C D

8 9 E FA B C DT TOTechnical Training

Organization

Page 99: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘C6201 Internal Data

Now, how many cycles would it take for these two LDW’s? 2

...

16

A D

...

16

A D

16

A D

...

16

A D

0 1 6 72 3 4 58 9 E FA B C D

8 9 E FA B C DT TOTechnical Training

Organization

Page 100: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘0x Memory Optimization

Basic Memory Layout Improving Performance

Offset Arrays Unroll Loop

Aren’t There Two Blocks?

T TOTechnical Training

Organization

Page 101: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Solution 1: Offset Arrays Offset accesses

#pragma DATA_ALIGN(x, 8);#pragma DATA_ALIGN(a, 8);

int x[40] = {1, 2, 3, … };int a[41] = {0, 40, 39, 38, … };

int *xp = &x[0];int *ap = &a[1];

...

16

A D

...

16

A D

16

A D

...

16

A D

0 1 2 3

E F C D

x0 = 1

a0 = a[1] = 40a[0] = 0T TOTechnical Training

Organization

Page 102: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Solution 2: Unroll Loop Offset accesses Unroll the loop

Read two values from each array in parallel, then perform two calculations

... ... … ...

16

A D

16

A D

16

A D

16

A D

0 1 2 3

x0 x1

LDW from x0|| LDW from x1

T TOTechnical Training

Organization

Page 103: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Solution 2: Unroll Loop Offset accesses Unroll the loop:

Read two values from each array in parallel, then perform two calculations

... ... … ...

16

A D

16

A D

16

A D

16

A D

0 1 2 3

x0 x1

LDW from x0|| LDW from x1

LDW from a0|| LDW from a1

ADD x0, a0|| ADD x1, a1

a0 a1T TOTechnical Training

Organization

Page 104: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

‘0x Memory Optimization

Basic Memory Layout Improving Performance

Offset Arrays Unroll Loop

Aren’t There Two Blocks?

T TOTechnical Training

Organization

Page 105: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

Two Blocks of Memory (4 banks each)

4Kx16 4Kx16 4Kx16 4Kx16

16 16 16 16

0 1 6 72 3 4 58 9 E FA B C D

4Kx16 4Kx16 4Kx16 4Kx16

0 1 6 72 3 4 58 9 E FA B C D

16 16 16 16

Why use offset-arrays or loop-unrolling if there’s two blocks?

This allows the DMA unrestricted access to internal memoryT TOTechnical Training

Organization

Page 106: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

OutlineWhy Cache?Cache BasicsCache Example (Direct-Mapped)C6211/C671x Internal Memory‘C64x Internal Memory OverviewAdditional Memory/Cache TopicsUsing the C OptimizerLab 15Optional Topics

‘0x Memory Summary & Optimization

Page 107: Internal Memory & Cache Chapter 15 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. T TO Technical Training Organization.

ti

Technical TrainingOrganization

T TOTechnical Training

Organization