CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement...
Transcript of CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement...
![Page 1: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/1.jpg)
CSC 252: Computer Organization Spring 2018: Lecture 26
Instructor: Yuhao Zhu
Department of Computer ScienceUniversity of Rochester
Action Items: • Programming Assignment 4 grades out • Programming Assignment 5 re-grade open • Programming Assignment 6 due soon
![Page 2: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/2.jpg)
Carnegie Mellon
Announcement• Programming assignment 6 is due on 11:59pm, Monday, April 30.• Programming assignment 5 re-grade is open until 11:59pm, Friday• Programming assignment 4 grades are out
2
Due Last Lecture
![Page 3: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/3.jpg)
Carnegie Mellon
Today• Shared variables in multi-threaded programming
• Mutual exclusion using semaphore • Deadlock
• Thread-level parallelism• Amdahl’s Law: performance model of parallel programs
• Hardware support for multi-threading• Single-core • Hyper-threading • Multi-core • Cache coherence
3
![Page 4: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/4.jpg)
Carnegie Mellon
Binary Semaphore Protecting Critical Section
• Define and initialize a mutex for the shared variable cnt:
4
volatile long cnt = 0; /* Counter */ sem_t mutex; /* Semaphore that protects cnt */ Sem_init(&mutex, 0, 1); /* mutex = 1 */
• Surround critical section with P and V:
for (i = 0; i < niters; i++) { P(&mutex); cnt++; V(&mutex); } goodcnt.c
![Page 5: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/5.jpg)
Carnegie Mellon
Deadlock• Def: A process/thread is deadlocked if and only if it is waiting for
a condition that will never be true• General to concurrent/parallel programming (threads,
processes)• Typical Scenario
• Processes 1 and 2 needs two resources (A and B) to proceed • Process 1 acquires A, waits for B • Process 2 acquires B, waits for A • Both will wait forever!
5
![Page 6: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/6.jpg)
Carnegie Mellon
Deadlocking With Semaphores
6
void *count(void *vargp) { int i; int id = (int) vargp; for (i = 0; i < NITERS; i++) { P(&mutex[id]); P(&mutex[1-id]); cnt++; V(&mutex[id]); V(&mutex[1-id]); } return NULL; }
int main() { pthread_t tid[2]; Sem_init(&mutex[0], 0, 1); /* mutex[0] = 1 */ Sem_init(&mutex[1], 0, 1); /* mutex[1] = 1 */ Pthread_create(&tid[0], NULL, count, (void*) 0); Pthread_create(&tid[1], NULL, count, (void*) 1); Pthread_join(tid[0], NULL); Pthread_join(tid[1], NULL); printf("cnt=%d\n", cnt); exit(0); }
Tid[0]: P(s0); P(s1); cnt++; V(s0); V(s1);
Tid[1]: P(s1); P(s0); cnt++; V(s1); V(s0);
![Page 7: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/7.jpg)
Carnegie Mellon
Avoiding Deadlock
7
Tid[0]: P(s0); P(s1); cnt++; V(s0); V(s1);
Tid[1]: P(s0); P(s1); cnt++; V(s1); V(s0);
Acquire shared resources in same order
Tid[0]: P(s0); P(s1); cnt++; V(s0); V(s1);
Tid[1]: P(s1); P(s0); cnt++; V(s1); V(s0);
![Page 8: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/8.jpg)
Carnegie Mellon
Another Deadlock Example: Signal Handling
• Signal handlers are concurrent with main program and may share the same global data structures.
8
![Page 9: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/9.jpg)
Carnegie Mellon
Another Deadlock Example: Signal Handling
• Signal handlers are concurrent with main program and may share the same global data structures.
8
static int x = 5; void handler(int sig) { x = 10; }
int main(int argc, char **argv) { int pid; Signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
if (x == 5) y = x * 2; // You’d expect y == 10 exit(0); }
![Page 10: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/10.jpg)
Carnegie Mellon
Another Deadlock Example: Signal Handling
• Signal handlers are concurrent with main program and may share the same global data structures.
8
static int x = 5; void handler(int sig) { x = 10; }
int main(int argc, char **argv) { int pid; Signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
if (x == 5) y = x * 2; // You’d expect y == 10 exit(0); }
What if the following happens:
![Page 11: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/11.jpg)
Carnegie Mellon
Another Deadlock Example: Signal Handling
• Signal handlers are concurrent with main program and may share the same global data structures.
8
static int x = 5; void handler(int sig) { x = 10; }
int main(int argc, char **argv) { int pid; Signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
if (x == 5) y = x * 2; // You’d expect y == 10 exit(0); }
What if the following happens:• Parent process executes and
finishes if (x == 5)
![Page 12: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/12.jpg)
Carnegie Mellon
Another Deadlock Example: Signal Handling
• Signal handlers are concurrent with main program and may share the same global data structures.
8
static int x = 5; void handler(int sig) { x = 10; }
int main(int argc, char **argv) { int pid; Signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
if (x == 5) y = x * 2; // You’d expect y == 10 exit(0); }
What if the following happens:• Parent process executes and
finishes if (x == 5)• OS decides to take the
SIGCHLD interrupt and executes the handler
![Page 13: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/13.jpg)
Carnegie Mellon
Another Deadlock Example: Signal Handling
• Signal handlers are concurrent with main program and may share the same global data structures.
8
static int x = 5; void handler(int sig) { x = 10; }
int main(int argc, char **argv) { int pid; Signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
if (x == 5) y = x * 2; // You’d expect y == 10 exit(0); }
What if the following happens:• Parent process executes and
finishes if (x == 5)• OS decides to take the
SIGCHLD interrupt and executes the handler
• When return to parent process, y == 20!
![Page 14: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/14.jpg)
Carnegie Mellon
Fixing the Signal Handling Bug
9
static int x = 5; void handler(int sig) { x = 10; }
int main(int argc, char **argv) { int pid; sigset_t mask_all, prev_all; sigfillset(&mask_all); signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
Sigprocmask(SIG_BLOCK, &mask_all, &prev_all); if (x == 5) y = x * 2; // You’d expect y == 10 Sigprocmask(SIG_SETMASK, &prev_all, NULL);
exit(0); }
• Block all signals before accessing a shared, global data structure.
![Page 15: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/15.jpg)
Carnegie Mellon
How About Using a Mutex?
10
static int x = 5; void handler(int sig) { P(&mutex); x = 10; V(&mutex); }
int main(int argc, char **argv) { int pid; sigset_t mask_all, prev_all; signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
P(&mutex); if (x == 5) y = x * 2; // You’d expect y == 10 V(&mutex);
exit(0); }
![Page 16: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/16.jpg)
Carnegie Mellon
How About Using a Mutex?
10
static int x = 5; void handler(int sig) { P(&mutex); x = 10; V(&mutex); }
int main(int argc, char **argv) { int pid; sigset_t mask_all, prev_all; signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
P(&mutex); if (x == 5) y = x * 2; // You’d expect y == 10 V(&mutex);
exit(0); }
• This implementation will get into a deadlock.
![Page 17: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/17.jpg)
Carnegie Mellon
How About Using a Mutex?
10
static int x = 5; void handler(int sig) { P(&mutex); x = 10; V(&mutex); }
int main(int argc, char **argv) { int pid; sigset_t mask_all, prev_all; signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
P(&mutex); if (x == 5) y = x * 2; // You’d expect y == 10 V(&mutex);
exit(0); }
• This implementation will get into a deadlock.
• Signal handler wants the mutex, which is acquired by the main program.
![Page 18: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/18.jpg)
Carnegie Mellon
How About Using a Mutex?
10
static int x = 5; void handler(int sig) { P(&mutex); x = 10; V(&mutex); }
int main(int argc, char **argv) { int pid; sigset_t mask_all, prev_all; signal(SIGCHLD, handler);
if ((pid = Fork()) == 0) { /* Child */ Execve("/bin/date", argv, NULL); }
P(&mutex); if (x == 5) y = x * 2; // You’d expect y == 10 V(&mutex);
exit(0); }
• This implementation will get into a deadlock.
• Signal handler wants the mutex, which is acquired by the main program.
• Key: signal handler is in the same process as the main program. The kernel forces the handler to finish before returning to the main program.
![Page 19: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/19.jpg)
Carnegie Mellon
Summary of Multi-threading Programming• Concurrent/parallel threads access shared variables• Need to protect concurrent accesses to guarantee correctness• Semaphores (e.g., mutex) provide a simple solution• Can lead to deadlock if not careful• Take CSC 258 to know more about avoiding deadlocks (and
parallel programming in general)
11
![Page 20: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/20.jpg)
Thinking in Parallel is Hard
12
![Page 21: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/21.jpg)
Thinking in Parallel is Hard
12
Maybe Thinking is Hard
![Page 22: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/22.jpg)
Carnegie Mellon
Today• Shared variables in multi-threaded programming
• Mutual exclusion using semaphore • Deadlock
• Thread-level parallelism• Amdahl’s Law: performance model of parallel programs
• Hardware support for multi-threading• Single-core • Hyper-threading • Multi-core • Cache coherence
13
![Page 23: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/23.jpg)
Carnegie Mellon
Thread-level Parallelism (TLP)• Thread-Level Parallelism
• Splitting a task into independent sub-tasks • Each thread is responsible for a sub-task
• Example: Parallel summation of N number • Should add up to ((n-1)*n)/2
• Partition values 1, …, n-1 into t ranges• ⎣n/t⎦ values in each range • Each of t threads processes one range (sub-task) • Sum all sub-sums in the end
14
![Page 24: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/24.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up
15Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
![Page 25: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/25.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
![Page 26: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/26.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15
1 - f
Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
![Page 27: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/27.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15
+1 - f
Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
![Page 28: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/28.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15
+1 - f fN
Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
![Page 29: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/29.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15
Speedup =1
+1 - f fN
Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
![Page 30: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/30.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15
Speedup =1
+1 - f fN
Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
•Completely parallelizable (f = 1): Speedup = N
![Page 31: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/31.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15
Speedup =1
+1 - f fN
Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
•Completely parallelizable (f = 1): Speedup = N•Completely sequential (f = 0): Speedup = 1
![Page 32: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/32.jpg)
Carnegie Mellon
Amdahl’s Law
•Gene Amdahl (1922 – 2015). Giant in computer architecture•Captures the difficulty of using parallelism to speed things up•Amdahl’s Law
• f: Parallelizable fraction of a program • N: Number of processors (i.e., maximal achievable speedup)
15
Speedup =1
+1 - f fN
Amdahl, “Validity of the single processor approach to achieving large scale computing capabilities,” 1967.
•Completely parallelizable (f = 1): Speedup = N•Completely sequential (f = 0): Speedup = 1•Mostly parallelizable (f = 0.9, N = 1000): Speedup = 9.9
![Page 33: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/33.jpg)
Carnegie Mellon
Sequential Bottleneck
16
f (parallel fraction)
![Page 34: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/34.jpg)
Carnegie Mellon
Why the Sequential Bottleneck?• Maximum speedup limited by the
sequential portion• Main cause: Non-parallelizable
operations on data
17
![Page 35: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/35.jpg)
Carnegie Mellon
Why the Sequential Bottleneck?• Maximum speedup limited by the
sequential portion• Main cause: Non-parallelizable
operations on data• Parallel portion is usually not
perfectly parallel as well• e.g., Synchronization overhead
17
![Page 36: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/36.jpg)
Carnegie Mellon
Why the Sequential Bottleneck?• Maximum speedup limited by the
sequential portion• Main cause: Non-parallelizable
operations on data• Parallel portion is usually not
perfectly parallel as well• e.g., Synchronization overhead
17
Each thread: loop { Compute P(A) Update shared data V(A) }
![Page 37: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/37.jpg)
Carnegie Mellon
Why the Sequential Bottleneck?• Maximum speedup limited by the
sequential portion• Main cause: Non-parallelizable
operations on data• Parallel portion is usually not
perfectly parallel as well• e.g., Synchronization overhead
17
Each thread: loop { Compute P(A) Update shared data V(A) }
N
![Page 38: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/38.jpg)
Carnegie Mellon
Why the Sequential Bottleneck?• Maximum speedup limited by the
sequential portion• Main cause: Non-parallelizable
operations on data• Parallel portion is usually not
perfectly parallel as well• e.g., Synchronization overhead
17
Each thread: loop { Compute P(A) Update shared data V(A) }
N
C
![Page 39: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/39.jpg)
Carnegie Mellon
Why the Sequential Bottleneck?• Maximum speedup limited by the
sequential portion• Main cause: Non-parallelizable
operations on data• Parallel portion is usually not
perfectly parallel as well• e.g., Synchronization overhead
17
Each thread: loop { Compute P(A) Update shared data V(A) }
N
C
![Page 40: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/40.jpg)
Carnegie Mellon
Today• Shared variables in multi-threaded programming
• Mutual exclusion using semaphore • Deadlock
• Thread-level parallelism• Amdahl’s Law: performance model of parallel programs
• Hardware support for multi-threading• Single-core • Hyper-threading • Multi-core • Cache coherence
18
![Page 41: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/41.jpg)
Carnegie Mellon
Can A Single Core Support Multi-threading?
• Need to multiplex between different threads (time slicing)
19
Thread A Thread B Thread C
Sequential Multi-threaded
![Page 42: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/42.jpg)
Carnegie Mellon
Any benefits?• Can single-core multi-threading provide any performance gains?
20
Thread A Thread B Thread C
![Page 43: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/43.jpg)
Carnegie Mellon
Any benefits?• Can single-core multi-threading provide any performance gains?
20
Thread A Thread B Thread C
CacheMiss!
![Page 44: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/44.jpg)
Carnegie Mellon
Any benefits?• Can single-core multi-threading provide any performance gains?
20
Thread A Thread B Thread C
CacheMiss!
![Page 45: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/45.jpg)
Carnegie Mellon
Any benefits?• Can single-core multi-threading provide any performance gains?• If Thread A has a cache miss and the pipeline gets stalled,
switch to Thread C. Improves the overall performance.
20
Thread A Thread B Thread C
CacheMiss!
![Page 46: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/46.jpg)
Carnegie Mellon
When to Switch?
21
• Coarse grained• Event based, e.g., switch on L3 cache miss • Quantum based (every thousands of cycles)
![Page 47: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/47.jpg)
Carnegie Mellon
When to Switch?
21
• Coarse grained• Event based, e.g., switch on L3 cache miss • Quantum based (every thousands of cycles)
• Fine grained• Cycle by cycle • Thornton, “CDC 6600: Design of a Computer,” 1970. • Burton Smith, “A pipelined, shared resource MIMD computer,” ICPP
1978. Seminal paper that shows that using multi-threading can avoid branch prediction.
![Page 48: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/48.jpg)
Carnegie Mellon
When to Switch?
21
• Coarse grained• Event based, e.g., switch on L3 cache miss • Quantum based (every thousands of cycles)
• Fine grained• Cycle by cycle • Thornton, “CDC 6600: Design of a Computer,” 1970. • Burton Smith, “A pipelined, shared resource MIMD computer,” ICPP
1978. Seminal paper that shows that using multi-threading can avoid branch prediction.
•Either way, need to save/restore thread context upon switching
![Page 49: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/49.jpg)
Carnegie Mellon
Today• Shared variables in multi-threaded programming
• Mutual exclusion using semaphore • Deadlock
• Thread-level parallelism• Amdahl’s Law: performance model of parallel programs
• Hardware support for multi-threading• Single-core • Hyper-threading • Multi-core • Cache coherence
22
![Page 50: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/50.jpg)
Carnegie Mellon
Single-Core Internals
23
Instruction Control
Registers
Instruction Decoder
Inst. Window
Instruction Cache
PC
Functional Units
Int Arith
Int Arith
FP Arith
Load / Store Data Cache
• Typically has multiple function units to allow for issuing multiple instructions at the same time
• Called “Superscalar” Microarchitecture
![Page 51: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/51.jpg)
Carnegie Mellon
Conventional Multi-threading
24
Thread 1
Context Switch
Thread 2
![Page 52: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/52.jpg)
Carnegie Mellon
Conventional Multi-threading
24
Functional Units
Thread 1
Context Switch
Thread 2
![Page 53: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/53.jpg)
Carnegie Mellon
Hyper-threading
25
Functional Units
Int Arith
Int Arith
FP Arith
Load / Store
Instruction Control
Instruction Decoder
Data Cache
Instruction Cache
• Intel’s terminology. More commonly known as: Simultaneous Multi-threading (SMT)
• Replicate enough hardware structures to process K instruction streams• K copies of all registers. Share functional units
Reg A Inst. Window A
PC A
![Page 54: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/54.jpg)
Carnegie Mellon
Hyper-threading
25
Functional Units
Int Arith
Int Arith
FP Arith
Load / Store
Instruction Control
Instruction Decoder
Data Cache
Instruction Cache
• Intel’s terminology. More commonly known as: Simultaneous Multi-threading (SMT)
• Replicate enough hardware structures to process K instruction streams• K copies of all registers. Share functional units
Reg A Inst. Window A
PC AReg B Inst. Window B
PC B
![Page 55: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/55.jpg)
Carnegie Mellon
Conventional Multi-threading vs. Hyper-threading
26
Thread 1
Context Switch
Thread 2
Conventional Multi-threading Hyper-threading
![Page 56: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/56.jpg)
Carnegie Mellon
Conventional Multi-threading vs. Hyper-threading
26
Thread 1
Context Switch
Thread 2
Conventional Multi-threading Hyper-threading
Multiple threads actually execute in parallel (even with one single core)
![Page 57: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/57.jpg)
Carnegie Mellon
Conventional Multi-threading vs. Hyper-threading
26
Thread 1
Context Switch
Thread 2
Conventional Multi-threading Hyper-threading
Multiple threads actually execute in parallel (even with one single core)
No/little context switch overhead
![Page 58: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/58.jpg)
Carnegie Mellon
Today• Shared variables in multi-threaded programming
• Mutual exclusion using semaphore • Deadlock
• Thread-level parallelism• Amdahl’s Law: performance model of parallel programs
• Hardware support for multi-threading• Single-core • Hyper-threading • Multi-core • Cache coherence
27
![Page 59: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/59.jpg)
Carnegie Mellon
Typical Multi-core Processor
• Traditional multiprocessing: symmetric multiprocessor (SMP)
• Every core is exactly the same. Private registers, L1/L2 caches, etc.
• Share L3 (LLC) and main memory
28
Regs
L1 d-‐cache
L1 i-‐cache
L2 unified cache
Core 0
Regs
L1 d-‐cache
L1 i-‐cache
L2 unified cache
Core n-‐1
…
L3 unified cache (shared by all cores)
Main memory
![Page 60: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/60.jpg)
Carnegie Mellon
Asymmetric Multiprocessor (AMP)
29
Ener
gy C
onsu
mpt
ion
Performance
Big Core Small Core
Frequency Levels
• Offer a large performance-energy trade-off space
![Page 61: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/61.jpg)
Carnegie Mellon
Asymmetric Chip-Multiprocessor (ACMP)
30
• Already used in commodity devices (e.g., Samsung Galaxy S6, iPhone 7)
![Page 62: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/62.jpg)
Carnegie Mellon
Combine Multi-core with Hyper-threading• Common for laptop/desktop/server machine. E.g., 2 physical
cores, each core has 2 hyper-threads => 4 virtual cores.• Not for mobile processors (Hyper-threading costly to implement)
31
![Page 63: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/63.jpg)
Carnegie Mellon
Today• Shared variables in multi-threaded programming
• Mutual exclusion using semaphore • Deadlock
• Thread-level parallelism• Amdahl’s Law: performance model of parallel programs
• Hardware support for multi-threading• Single-core • Hyper-threading • Multi-core • Cache coherence
32
![Page 64: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/64.jpg)
Carnegie Mellon
The Issue• Assume that we have a multi-core processor. Thread 0 runs on Core 0,
and Thread 1 runs on Core 1.
33
![Page 65: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/65.jpg)
Carnegie Mellon
The Issue• Assume that we have a multi-core processor. Thread 0 runs on Core 0,
and Thread 1 runs on Core 1.• Threads share variables: e.g., Thread 0 writes to an address, followed
by Thread 1 reading.
33
![Page 66: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/66.jpg)
Carnegie Mellon
The Issue• Assume that we have a multi-core processor. Thread 0 runs on Core 0,
and Thread 1 runs on Core 1.• Threads share variables: e.g., Thread 0 writes to an address, followed
by Thread 1 reading.
33
Thread 0 Mem[A] = 1
Thread 1 …
Print Mem[A]
![Page 67: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/67.jpg)
Carnegie Mellon
The Issue• Assume that we have a multi-core processor. Thread 0 runs on Core 0,
and Thread 1 runs on Core 1.• Threads share variables: e.g., Thread 0 writes to an address, followed
by Thread 1 reading.• Each read should receive the value last written by anyone
33
Thread 0 Mem[A] = 1
Thread 1 …
Print Mem[A]
![Page 68: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/68.jpg)
Carnegie Mellon
The Issue• Assume that we have a multi-core processor. Thread 0 runs on Core 0,
and Thread 1 runs on Core 1.• Threads share variables: e.g., Thread 0 writes to an address, followed
by Thread 1 reading.• Each read should receive the value last written by anyone• Basic question: If multiple cores access the same data, how do they
ensure they all see a consistent state?
33
Thread 0 Mem[A] = 1
Thread 1 …
Print Mem[A]
![Page 69: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/69.jpg)
Carnegie Mellon
The Issue• Without cache, the issue is (theoretically) solvable by using mutex.• …because there is only one copy of x in the entire system. Accesses
to x in memory are serialized by mutex.
34
C1 C2
xMain Memory
1000
Bus
Write: x=1000 Read: x
![Page 70: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/70.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
xMain Memory
1000
![Page 71: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/71.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
xMain Memory
1000
Read: x
![Page 72: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/72.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
1000
xMain Memory
1000
Read: x
![Page 73: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/73.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
1000
xMain Memory
1000
Read: xRead: x
![Page 74: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/74.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
10001000
xMain Memory
1000
Read: xRead: x
![Page 75: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/75.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
10001000
xMain Memory
1000
Read: xRead: xx=x+1000 Write: x
![Page 76: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/76.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
1000
xMain Memory
1000
Read: xRead: xx=x+1000 Write: x
2000
![Page 77: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/77.jpg)
Carnegie Mellon
The Issue• What if each core cache the same data, how do they ensure they all
see a consistent state? (assuming a write-back cache)
35
C1 C2
Bus
1000
xMain Memory
1000
Read: xRead: xx=x+1000 Write: x Read: x Should not
return 1000!2000
![Page 78: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/78.jpg)
Carnegie Mellon
Cache Coherence: The Idea• Key issue: there are multiple copies of the same data in the
system, and they could have different values at the same time.
36
![Page 79: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/79.jpg)
Carnegie Mellon
Cache Coherence: The Idea• Key issue: there are multiple copies of the same data in the
system, and they could have different values at the same time.• Key idea: ensure multiple copies have same value, i.e., coherent
36
![Page 80: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/80.jpg)
Carnegie Mellon
Cache Coherence: The Idea• Key issue: there are multiple copies of the same data in the
system, and they could have different values at the same time.• Key idea: ensure multiple copies have same value, i.e., coherent• How? Two options:
36
![Page 81: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/81.jpg)
Carnegie Mellon
Cache Coherence: The Idea• Key issue: there are multiple copies of the same data in the
system, and they could have different values at the same time.• Key idea: ensure multiple copies have same value, i.e., coherent• How? Two options:
• Update: push new value to all copies (in other caches)
36
![Page 82: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/82.jpg)
Carnegie Mellon
Cache Coherence: The Idea• Key issue: there are multiple copies of the same data in the
system, and they could have different values at the same time.• Key idea: ensure multiple copies have same value, i.e., coherent• How? Two options:
• Update: push new value to all copies (in other caches)• Invalidate: invalidate other copies (in other caches)
36
![Page 83: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/83.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
1000
I
M S
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
![Page 84: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/84.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
1000
Read: x
I
M S
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
![Page 85: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/85.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
1000
xMain Memory
1000
Read: x
I
M S
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
![Page 86: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/86.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
1000
xMain Memory
1000
Read: x
I
M S
PrRd/BusRd
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
![Page 87: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/87.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
1000
xMain Memory
1000
Read: xRead: x
I
M S
PrRd/BusRd
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
![Page 88: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/88.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
1000
xMain Memory
1000
Read: xRead: x
I
M S
PrRd/BusRd
BusRd/Supply Data
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
![Page 89: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/89.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
10001000
xMain Memory
1000
Read: xRead: x
I
M S
PrRd/BusRd
BusRd/Supply Data
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
![Page 90: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/90.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
10001000
xMain Memory
1000
Read: xRead: x
I
M S
PrRd/BusRd
BusRd/Supply Data
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
![Page 91: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/91.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
10001000
xMain Memory
1000
Read: xRead: x
I
M S
PrRd/BusRd
BusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
![Page 92: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/92.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
10001000
xMain Memory
1000
Read: xRead: x
Write: x = 5000
I
M S
PrRd/BusRd
BusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
![Page 93: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/93.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
1000
xMain Memory
1000
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
BusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
![Page 94: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/94.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
1000
xMain Memory
1000
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
![Page 95: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/95.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
1000
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
![Page 96: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/96.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
1000
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
PrRd/—
![Page 97: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/97.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
1000
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
PrRd/—PrWr/—
![Page 98: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/98.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
1000
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
PrRd/—PrWr/—
Read: x
![Page 99: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/99.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
1000
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
PrRd/—PrWr/—
Read: x
5000
![Page 100: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/100.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
PrRd/—PrWr/—
Read: x
5000
5000
![Page 101: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/101.jpg)
Carnegie Mellon
Invalidate-Based Cache Coherence
37
C1 C2
Bus
xMain Memory
Read: xRead: x
Write: x = 5000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared Read: x
PrRd/—PrWr/—
Read: x
BusRd/Flush
5000
5000
![Page 102: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/102.jpg)
Carnegie Mellon
5000
Write: x = 7000
Invalidate-Based Cache Coherence
38
C1 C2
Bus
xMain Memory
1000
5000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
PrRd/—PrWr/—
BusRd/Flush
![Page 103: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/103.jpg)
Carnegie Mellon
5000
Write: x = 7000
Invalidate-Based Cache Coherence
38
C1 C2
Bus
xMain Memory
1000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
PrRd/—PrWr/—
BusRd/Flush
5000
![Page 104: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/104.jpg)
Carnegie Mellon
5000
Write: x = 7000
Invalidate-Based Cache Coherence
38
C1 C2
Bus
xMain Memory
1000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
PrRd/—PrWr/—
BusRd/Flush
7000
![Page 105: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/105.jpg)
Carnegie Mellon
5000
Write: x = 7000
Invalidate-Based Cache Coherence
38
C1 C2
Bus
xMain Memory
1000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
PrRd/—PrWr/—
BusRd/Flush
PrWr/BusRdX7000
![Page 106: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/106.jpg)
Carnegie Mellon
5000
Write: x = 7000
Invalidate-Based Cache Coherence
38
C1 C2
Bus
xMain Memory
1000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
PrRd/—PrWr/—
BusRd/Flush
PrWr/BusRdX7000
BusRdX/Flush
![Page 107: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/107.jpg)
Carnegie Mellon
5000
Write: x = 7000
Invalidate-Based Cache Coherence
38
C1 C2
Bus
xMain Memory
1000
I
M S
PrRd/BusRd
PrWr/InvdBusRd/Supply DataPrRd/—
Below: State Transition for x in C2’s cache; Syntax: Event/Action
Associate each cache line with 3 states: Modified, Invalid, Shared
Invd/—
PrRd/—PrWr/—
BusRd/Flush
PrWr/BusRdX7000
BusRd/— BusRdX/— Invd/—
Invd/—BusRdX/Flush
BusRdX/Flush
![Page 108: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/108.jpg)
Carnegie Mellon
Readings: Cache Coherence• Most helpful
• Culler and Singh, Parallel Computer Architecture • Chapter 5.1 (pp 269 – 283), Chapter 5.3 (pp 291 – 305)
• Patterson&Hennessy, Computer Organization and Design • Chapter 5.8 (pp 534 – 538 in 4th and 4th revised eds.)
• Papamarcos and Patel, “A low-overhead coherence solution for multiprocessors with private cache memories,” ISCA 1984.
• Also very useful• Censier and Feautrier, “A new solution to coherence problems in multicache
systems,” IEEE Trans. Computers, 1978. • Goodman, “Using cache memory to reduce processor-memory traffic,” ISCA 1983. • Laudon and Lenoski, “The SGI Origin: a ccNUMA highly scalable server,” ISCA
1997. • Martin et al, “Token coherence: decoupling performance and correctness,” ISCA
2003. • Baer and Wang, “On the inclusion properties for multi-level cache hierarchies,”
ISCA 1988.
39
![Page 109: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/109.jpg)
Carnegie Mellon
Does Hardware Have to Keep Cache Coherent?
• Hardware-guaranteed cache coherence is complex to implement.
40
![Page 110: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/110.jpg)
Carnegie Mellon
Does Hardware Have to Keep Cache Coherent?
• Hardware-guaranteed cache coherence is complex to implement.• Can the programmers ensure cache coherence themselves?
40
![Page 111: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/111.jpg)
Carnegie Mellon
Does Hardware Have to Keep Cache Coherent?
• Hardware-guaranteed cache coherence is complex to implement.• Can the programmers ensure cache coherence themselves?• Key: ISA must provide cache flush/invalidate instructions
• FLUSH-LOCAL A: Flushes/invalidates the cache block containing address A from a processor’s local cache.
• FLUSH-GLOBAL A: Flushes/invalidates the cache block containing address A from all other processors’ caches.
• FLUSH-CACHE X: Flushes/invalidates all blocks in cache X.
40
![Page 112: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/112.jpg)
Carnegie Mellon
Does Hardware Have to Keep Cache Coherent?
• Hardware-guaranteed cache coherence is complex to implement.• Can the programmers ensure cache coherence themselves?• Key: ISA must provide cache flush/invalidate instructions
• FLUSH-LOCAL A: Flushes/invalidates the cache block containing address A from a processor’s local cache.
• FLUSH-GLOBAL A: Flushes/invalidates the cache block containing address A from all other processors’ caches.
• FLUSH-CACHE X: Flushes/invalidates all blocks in cache X.•Classic example: TLB
• Hardware does not guarantee that TLBs of different core are coherent • ISA provides instructions for OS to flush PTEs • Called “TLB shootdown”
40
![Page 113: CSC 252: Computer Organization Spring 2018: Lecture 26€¦ · Carnegie Mellon Announcement •Programming assignment 6 is due on 11:59pm, Monday, April 30. •Programming assignment](https://reader034.fdocuments.us/reader034/viewer/2022043008/5f99ad019f53825d9c514c1a/html5/thumbnails/113.jpg)
Carnegie Mellon
Does Hardware Have to Keep Cache Coherent?
• Hardware-guaranteed cache coherence is complex to implement.• Can the programmers ensure cache coherence themselves?• Key: ISA must provide cache flush/invalidate instructions
• FLUSH-LOCAL A: Flushes/invalidates the cache block containing address A from a processor’s local cache.
• FLUSH-GLOBAL A: Flushes/invalidates the cache block containing address A from all other processors’ caches.
• FLUSH-CACHE X: Flushes/invalidates all blocks in cache X.•Classic example: TLB
• Hardware does not guarantee that TLBs of different core are coherent • ISA provides instructions for OS to flush PTEs • Called “TLB shootdown”
40
Take CSC 251/ECE 204 to learn more about advanced computer architecture concepts.