Lecture6
Click here to load reader
-
Upload
asad-abbas -
Category
Technology
-
view
604 -
download
6
description
Transcript of Lecture6
![Page 1: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/1.jpg)
High Performance Computing
Jawwad ShamsiLecture #6
27th January 2010
![Page 2: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/2.jpg)
Recap
• Cache Coherence• NUMA
![Page 3: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/3.jpg)
Today’s topics
• Cache Coherence – Continuation• Vector Processing
![Page 4: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/4.jpg)
Cache Coherence
• In SMP or NUMA, multiple copies of cache– Each copy may have a different value of data item– Maintain Coherency• How?
![Page 5: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/5.jpg)
Cache Coherence: Two Approaches
• Write back: Update Main memory once cache is flushed.
• Write through: Write is updated to cache as well as to the main memory.
![Page 6: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/6.jpg)
Implementations
• Software Solutions: – Compile time decision– Conservative– Inefficient cache utilization
• Hardware Solutions:– Runtime decision– More effective
![Page 7: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/7.jpg)
Hardware based solution
• Directory Protocol• Snoopy Protocol
![Page 8: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/8.jpg)
Directory
• Centralized Controller– Individual cache controller makes a request• Centralized controller checks and issues command
– Updates information
![Page 9: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/9.jpg)
Directory
• Write– Processor requests exclusive writes– Controller sends message– Invalidates
• Read– Issues command to the processor – Holding Processor• Writes back to MM• Read permitted
![Page 10: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/10.jpg)
Directory
• Disadvantage– Centralized Controller– Bottleneck
• Advantage– Useful in large –scale system
![Page 11: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/11.jpg)
Snoopy Protocol
• Update operation announced• All Cache controllers snoop• Bus architecture– Careful• Increased Bus Traffic
![Page 12: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/12.jpg)
Snoopy Protocol
• Two approaches– Write Invalidate• One write• Multiple readers• Exclusive: Writer invalidates others entries
– Write Update• Multiple writers• All writes are updated
![Page 13: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/13.jpg)
Write Invalidate
• The MESI Protocol : P4 processor– Data cache: Two status bits, 4 states• Modified• Exclusive• Shared• Invalid• See Table
![Page 14: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/14.jpg)
4 Possibilities• Read Miss:
– EX to SH– SH to SH– MO to SH
• Read-Hit• Write-Miss
– RWITM– MO to IN– SH to IN
• Write Hit– SH to IN– EX – Mo
![Page 15: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/15.jpg)
L1- L2 Cache Consistency
![Page 16: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/16.jpg)
Parallel programming and Amdahl's Law
Suppose 1/N time for sequential codeAnd 1-1/N for the parallel
![Page 17: Lecture6](https://reader038.fdocuments.us/reader038/viewer/2022100506/547be0665906b572798b46a7/html5/thumbnails/17.jpg)
Amdahl's Law
• Speedup: speed gain of using parallel processor vs. single processor
• Speed= 1/(s+(p/N))– S=sequential code, p = parallel code, N= no. of
processors– S= T(1)/ T(j)• For j parallel processorsAs problem size increases, p may rise and s may decrease