Memory Hierarchy Part 2

53
Faculty of Computer Science CMPUT 229 © 2006 Memory Hierarchy Part 2 Refreshing Memory

description

Refreshing Memory. Memory Hierarchy Part 2. Writing Cache-Conscious Programs. Problem: Write C code for a function that computes the sum of the elements of a two dimensional array, a[M][N], of integers. int SumArray ( int a[][], int M, int N). - PowerPoint PPT Presentation

Transcript of Memory Hierarchy Part 2

Page 1: Memory Hierarchy Part 2

Faculty of Computer Science

CMPUT 229 © 2006

Memory HierarchyPart 2

Refreshing Memory

Page 2: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Writing Cache-Conscious Programs

Problem: Write C code for a function that computes the sum of the elements of a two dimensional array, a[M][N], of integers.

int SumArray(int a[][], int M, int N)

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

Page 3: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

••• Cache

Memory

Page 4: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

Byant/O’Hallaron, pp. 508

a[0][0] a[0][1] a[0][2] a[0][3]

••• Cache

Memory

Page 5: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]

a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

•••

a[0][1]

Cache

Memory Byant/O’Hallaron, pp. 508

Page 6: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]

a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

•••

a[0][1]

Cache

Memory Byant/O’Hallaron, pp. 508

Page 7: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]

a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

•••

a[0][1]

Cache

Memory Byant/O’Hallaron, pp. 508

Page 8: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]

a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

•••

a[0][5] a[1][0] a[1][1]a[0][4]

a[0][1]

Cache

Memory Byant/O’Hallaron, pp. 508

Page 9: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]

a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

•••

a[0][5] a[1][0] a[1][1]a[0][4]

a[0][1]

Cache

Memory Byant/O’Hallaron, pp. 508

Page 10: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayRows Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]

a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayRows(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (i=0 ; i<M ; i++) 7 for (j=0 ; j<N ; j++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

•••

a[0][5] a[1][0] a[1][1]a[0][4]

a[0][1]

Cache

Memory Byant/O’Hallaron, pp. 508

Page 11: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayCols Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

••• Cache

MemoryByant/O’Hallaron, pp. 508

Page 12: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayCols Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

•••

a[0][5] a[1][0] a[1][1]a[0][4]

Cache

Byant/O’Hallaron, pp. 508

Page 13: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

SumArrayCols Data Access Order

a[1][2]a[1][3]a[1][4]a[1][5]a[2][0]a[2][1]a[2]2]a[2][3]a[2][4]a[2][5]a[3][0]a[3][1]a[3][2]a[3][3]a[3][4]

•••

a[0][0]a[0][1]a[0][2]a[0][3]a[0][4]a[0][5]a[1][0]a[1][1]

0x8000 4000

0x8000 4004

0x8000 4010

0x8000 4024

0x8000 4008

0x8000 4014

0x8000 4028

0x8000 403C

0x8000 400C

0x8000 4018

0x8000 402C

0x8000 4040

0x8000 401C

0x8000 4030

0x8000 4044

0x8000 4050

0x8000 4020

0x8000 4034

0x8000 4048

0x8000 4054

0x8000 4038

0x8000 404C

0x8000 4058

•••

1 int SumArrayCols(int a[][], int M, int N) 2 { 3 int i, j; 4 int sum = 0; 5 6 for (j=0 ; j<N ; i++) 7 for (i=0 ; i<M ; i++) 8 sum += a[i][j]; 8 return sum; 9 }

a[0][0] a[0][1] a[0][2] a[0][3]

a[2][1] a[2][2] a[2][3]

•••

a[2][0]a[0][5] a[1][0] a[1][1]a[0][4]

Cache

Byant/O’Hallaron, pp. 508

Page 14: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

The Cost of Programming Productivity

Easy-to-read and easy-to-maintain code often result

in lower runtime performance.

StudentClass

University

Page 15: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

The Cost of Programming Productivity

Abstraction

Inheritance

StudentProfessor Support Staff

Person

Page 16: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

The Cost of Programming Productivity

Data Encapsulation

Person

Date of BirthGender

AddressCitizenship

Name

Driver Lic.

Student

FacultyDate of Adm

DepartmentProgram

Univ. ID

Classes Enr.Grades

Page 17: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Data Locality Primer

AMD Atlon 64 X2

Page 18: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Data Locality Primer: Cache Organization

POWER5 Cache Organization

– L1 Data Cache: 32 Kbytes, 128-byte cache lines

– L2 Cache: 1.44 Mbytes, 128-byte cache lines

– L3 Cache: 32 Mbytes, 512-byte cache lines

Page 19: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Data Locality Primer: Cache OrganizationBytes

FacultyDate of Adm

DepartmentProgram

Univ. ID

Classes Enr.Grades

Student:

1 byte4 bytes

1 byte2 bytes

4 bytes

4 bytes4 bytes4 bytes

Date of BirthGender

AddressCitizenship

Name

Driver Lic.

Person:

4 byte1 bytes

32 bytes16 bytes

32 bytes

4 bytes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••• 127

0

2

•••

255

Cac

he L

ines

Page 20: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Data Locality Primer: Data in Memory

Mem

ory

Add

ress

Bytes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••• 127

0

128

256

384

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

••• Fa. De Progr. Classes Enr. Grades Univ. ID

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

••• Fa. De Progr. Classes Enr. Grades Univ. ID

FacultyDate of Adm

DepartmentProgram

Univ. ID

Classes Enr.Grades

Student:

1 byte4 bytes

1 byte2 bytes

4 bytes

4 bytes4 bytes4 bytes

Page 21: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

0 ••• 30 31 32 33 ••• 36 37 ••• 47 48 ••• 51 52 ••• 69 ••• 84 85 ••• 89

768

1024

1152

1280Mem

ory

Add

ress

Data Locality Primer: Data in Memory

Name DofB Ge Citizens. Address Dr. Lic.

Namedress Ge Citizens. Dr. Lic. DofB

Name DofB Ge Citizens. Address Dr. Lic.

Namedress Ge Citizens. Dr. Lic. DofB

Date of BirthGender

AddressCitizenship

Name

Driver Lic.

Person:

4 byte1 bytes

32 bytes16 bytes

32 bytes

4 bytes

Page 22: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

0 ••• 30 31 32 33 ••• 36 37 ••• 47 48 ••• 51 52 ••• 69 ••• 84 85 ••• 89

768

1024

1152

1280Mem

ory

Add

ress

Data Locality Primer: Data in Memory

Mem

ory

Add

ress

Bytes

Name DofB Ge Citizens. Address Dr. Lic.

Namedress Ge Citizens. Dr. Lic. DofB

Name DofB Ge Citizens. Address Dr. Lic.

Namedress Ge Citizens. Dr. Lic. DofB

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••• 127

0

128

256

384

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

••• Fa. De Progr. Classes Enr. Grades Univ. ID

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

••• Fa. De Progr. Classes Enr. Grades Univ. ID

Page 23: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Example: A search through the data structures

How many Computing Science students are

younger than 23 year old?Bytes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••• 127

0

2

•••

255

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Cac

he L

ines

Page 24: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Example: A search through the data structures

How many Computing Science students are younger than 23 year old?

Load 128 bytes and uses 5 bytes!

Bytes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••• 127

0

2

•••

255

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Cac

he L

ines

Page 25: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Example: A search through the data structures

How many Computing Science students are younger than 23 year old?

Load 128 bytes and uses 5 bytes!

Bytes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••• 127

0

2

•••

255

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Name DofB Ge Citizens. Address Dr. Lic.

Cac

he L

ines

Page 26: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Example: A search through the data structures

How many Computing Science students are younger than 23 year old?

Load 128 bytes and uses 5.3 bytes!

Load 128 bytes and uses 5.8 bytes!

Bytes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••• 127

0

2

•••

255

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Name DofB Ge Citizens. Address Dr. Lic.

Cac

he L

ines

Page 27: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Data Reshaping for Arrays of StructuresStudent *ListOfStudents;

….

ListOfStudents = (Student*)malloc(….);

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Univ. ID

Date of Adm.

Fa.

De

Progr.

Univ. ID

Date of Adm.

Fa.

De

Progr.

Univ. ID

Date of Adm.

Fa.

De

Progr.

••• •••

•••

•••

•••

•••

•••

•••

Page 28: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Reshaping Linked Data Structures

E.g. A linked list of students

struct student { int age; int studentNumber; int studentProgram;float averageGrade;struct student *next;

};

age num gpaprog age num gpaprog …

Page 29: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Maximal Structure Splitting

age1 num1 gpa1prog1

age2 num2 gpa2prog2

age3 num3 gpa3prog3

age1 age2 age3

num1 num2 num3

prog1 prog2 prog3

gpa1 gpa2 gpa3

next1 next2 next3

Page 30: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Is it safe to transform a given data structure?

Build alias set

– If a pointer P points to the structure

• Then all the objects in the points-to set of P must have the

same layout.

• The layout of two structures is the same if each field has the

same offset and the same length.

Page 31: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

If pool is full another

pool can be allocated

Page 32: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

age1

num1

prog1

gpa1

next1

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

Page 33: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

age1 age2

num1 num2

prog1 prog2

gpa1 gpa2

next1 next2

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

Page 34: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

age1 age2 age3

num1 num2 num3

prog1 prog2 prog3

gpa1 gpa2 gpa3

next1 next2 next3

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

Page 35: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

age1 age2 age3

num1 num2 num3

prog1 prog2 prog3

gpa1 gpa2 gpa3

next1 next2 next3

age4

num4

prog4

gpa4

next4

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

Page 36: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

age1 age2 age3

num1 num2 num3

prog1 prog2 prog3

gpa1 gpa2 gpa3

next1 next2 next3

age4

num4

prog4

gpa4

next4

age5

num5

prog5

gpa5

next6

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

Page 37: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

age1 age2 age3

num1 num2 num3

prog1 prog2 prog3

gpa1 gpa2 gpa3

next1 next2 next3

age4

num4

prog4

gpa4

next4

age5

num5

prog5

gpa5

next6

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

If pool is full another

pool can be allocated

Page 38: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pool Allocation

age1 age2 age3

num1 num2 num3

prog1 prog2 prog3

gpa1 gpa2 gpa3

next1 next2 next3

age4

num4

prog4

gpa4

next4

age5

num5

prog5

gpa5

next6

age7

num7

prog7

gpa7

next7

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

If pool is full another

pool can be allocated

Page 39: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Pointer Dereferencing - Before

struct student { int age; int studentNumber; int studentProgram;float averageGrade;struct student *next;

};

struct student *s = malloc (sizeof (struct student));

s->age = 21;

s->averageGrade = 3.8;

s->age == *(s + 0)

s->averageGrade == *(s + 12)

age num gpaprog0 4 8 12 16

age num gpaprog …0 4 8 12 16

s

Page 40: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Uniform Structure Splitting

Requires that all in the structure have the same

number of bytes

– Advantage

• Simpler address computation

– Disadvantage

• Either restrict the application of the technique

• Or wastes memory with padding to create same-length fields

Page 41: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Uniform Splitting Pointer Transformation

age1 age2 age3

num1 num2 num3

prog1 prog2 prog3

gpa1 gpa2 gpa3

next1 next2 next3

s1->age == *(s1 + 0)

s1->gpa == *(s1 + (3 * pool_field_len))

s1

Pool_field_len is the same for each field

3 * pool_field_len

pool_field_len

Page 42: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Non-Uniform Structure Splitting

Requires pools to be aligned by the size of the pool.

E.g. If the pools are 4k then they must be aligned on

4k boundaries.

More general

Address calculation is more involved

Page 43: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Non-UniformExample

struct example { type_2 a; /* 4 bytes */type_8 b; /* 8 bytes */type_4 c; /* 4 bytes */};

s

How can the compiler

find the address to

access:

s->c

Page 44: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Non-UniformExample

struct example { type_2 a; /* 4 bytes */type_8 b; /* 8 bytes */type_4 c; /* 4 bytes */};

s

How can the compiler

find the address to

access:

s->c

pool_base = s & 0x0…0FFF

index = (s – pool_base) / 2

field_base = (2+8)*num_structs_per_pool

s->c = *(s + field_base + 4*index - index*2)

s->c = *(s + field_base + 4*index - s + pool_base)

s->c = *(field_base + 4*index + pool_base)

Page 45: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Micro Benchmarks (Speedup)Power 4 Power 5

Page 46: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Micro Benchmarks (Instruction Count)

Power 4 Power 5

Page 47: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Micro Benchmarks (CPI)

Power 4 Power 5

Page 48: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Micro Benchmarks (DTLB Misses)

Power 4 Power 5

Page 49: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Micro Benchmarks (L1D Misses)

Power 4 Power 5

Page 50: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Micro Benchmarks (L2 Misses)

Power 4 Power 5

Page 51: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Micro Benchmarks (L3 Misses)

Power 4 Power 5

Page 52: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments

Evaluated SPEC 2000, Olden and LLU

Many opportunities in SPEC missed

– Pointer analysis didn’t have enough precision to identify

opportunities in the SPEC 2000 benchmarks

– Could only identify small opportunities

– No impact on performance

Page 53: Memory Hierarchy Part 2

© 2006

Department of Computing Science

CMPUT 229

Experiments - Olden & LLU (Speedup)

Power 4 Power 5

bhem

3d

healt

h

power tsp llu bh

em3d

healt

h

power tsp llu