Kris Lange Nopparat suwaanarat Pree Thiengburanathum.
-
Upload
victor-lucas -
Category
Documents
-
view
220 -
download
4
Transcript of Kris Lange Nopparat suwaanarat Pree Thiengburanathum.
![Page 1: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/1.jpg)
Heterogeneous Thread Assignment Simulation
Kris LangeNopparat suwaanarat
Pree Thiengburanathum
![Page 2: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/2.jpg)
Introduction Motivation Review concepts M5 architecture Configuring M5 Simulator Simulation Results and Analysis Conclusion
Agenda
![Page 3: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/3.jpg)
Basis: "Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures“
Paper makes 2 claims◦ Heterogeneous CMP outperform homogenous
CMP (for a fixed total die size)◦ Benefits of heterogeneous CMP are enhanced
using dynamic thread assignment policies
Introduction
![Page 4: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/4.jpg)
Gain deeper understanding of research paper
Verify results of this paper Gain hands-on experience running a peer-
reviewed experiment
Motivation
![Page 5: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/5.jpg)
Heterogeneous CMP system Homogeneous CMP system Heterogeneous VS Homogenous in multi-
programmed.
Review: Concepts
![Page 6: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/6.jpg)
Heterogeneous CMP systemMany simple cores = higher thread parallelismFewer cores, larger = lower thread parallelism
We want to maximize resource utilization and achieve high degree of inter-thread
parallelism.
How? Mapping running tasks and using control mechanism.
Review: Concepts
![Page 7: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/7.jpg)
Which one has a better total execution time? Control mechanism:Thread Assignment Policies:
Static thread assignmentrandombest
Dynamic thread assignmentround robinIPC driven
Review: ConceptP1 P2
Thread A 1.6 0.4
Thread B 1.5 1
![Page 8: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/8.jpg)
•Static thread Assignment•Usually assign thread to the faster core.• Well studies problem before assign.• Solution rely on heuristics
• a random static assignment. Don’t know the work loads and IPC.
• a pseudo best static assignment. Know the work loads and IPC, use heuristic to find out.
• Disadvantages: Doesn’t assign thread in run time. does not optimize faster core(s) usage. slow” threads on slower core(s) penalize overall system
performance.
8
Concepts: Assignment Policies
![Page 9: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/9.jpg)
Dynamic thread assignment◦ Round Robin Assignment rotating the assignment of threads to processors in a
round robin fashion. ensures that the available faster are equally shared
among the running programs.
9
Concepts: Assignment Policies
![Page 10: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/10.jpg)
IPC driven Assignment◦ Considering the characteristics of the executing
threads.◦ Look at IPC number and ratio between two cores
to decide the thread mapping.◦ Thread with higher ratio run on faster core.◦ Thread with lower ratio run on lower core.
10
Concepts: Assignment policies
![Page 11: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/11.jpg)
Goal: duplicate experiment in paper (peer-reviewed)
2-phase simulation◦1) Obtain IPC trace values for Spec2000 programs Using M5 simulator Alpha EV5 + EV6 cores
◦2) Use our own simulator to model various heterogeneous CMP configurations and evaluate assignment policies
Simulation Approach
![Page 12: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/12.jpg)
Which simulator is suitable ? Rsim Simple MP SimOS Simic TFsim SimFlex GEMS
![Page 13: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/13.jpg)
Introduction & Overview What is M5 ?
A brief peek inside
![Page 14: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/14.jpg)
What is M5 ? A modular platform for simulating systems
Encompass
system-level architecture
processor microarchitecture
![Page 15: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/15.jpg)
key properties of M5 Pervasively Object-oriented
Multiple interchangeable CPU models
Event-driven memory system
Multiprocessor / multi-system capability
![Page 16: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/16.jpg)
Overview of M5 Architecture
CPU
L1
cache
BUS
L2
cache
BUS
Busbridge
Busbridge
Mem
I/Odevice
BUS
BUS
M5 M5
M5
M5
M5
![Page 17: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/17.jpg)
M5’s Architecture CPU Models ISA Memory System Cache Buses
![Page 18: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/18.jpg)
CPU model• A Simple CPU model• 2 Detail CPU models
![Page 19: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/19.jpg)
CPU model
Backward Communication
Fetch Decode Rename
Issueexecutionwritebac
k
Commit
![Page 20: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/20.jpg)
Instruction Set Architecture (ISA)
goal allow human-readable ISA description
two parts◦ A simple part- describes the decode◦ A declaration part-describes the global
information
![Page 21: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/21.jpg)
Memory System
Goal
combine the timing and functional models into one model
Simplify the memory system code Make changes easier
![Page 22: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/22.jpg)
Memory Architecture cache
port
port
mem
cache
port
port
Bus
port
mem
cache
port
port
port
peer
peer
peer
peer
![Page 23: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/23.jpg)
Cache
Coherency Prefetching
BASEPrefetcher
Prefetcher
BHB Prefetcher StirdePrefetcher TaggedPrefetcher
![Page 24: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/24.jpg)
BUSES
memory , I/O , CPUs Master- closer to memory Slave- closer to CPU
![Page 25: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/25.jpg)
Setup for M5 Simulator◦ Window Vista running VMware on fedora core.
Download the simulator from the website.◦ www.m5sim.org (open source)
Required Software:◦ g++, python, scons, zlib, swig
Configuring the M5 Simulator
![Page 26: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/26.jpg)
FS mode ◦ Full System mode. This mode simulates a complete
system including a kernel, I/O devices, etc. This mode currently only works with the ALPHA architecture.
SE mode◦ Syscall Emulation mode. This mode simulates
statically compiled binaries by functionally emulating any syscall they make.
Example of commands how to build and run M5◦ % scons build/ALPHA_SE/m5.debug◦ % ./build/ALPHA_SE/m5.debug config/example/se.py
Building, Compiling and running M5
![Page 27: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/27.jpg)
What is cross compilation?◦ Compiling a program for a target platform
different from the platform the compiler is run on M5 test programs must be compiled
Alpha+Linux Why?
◦ M5 implements Alpha ISA and Linux syscalls Since we don’t own Alpha hardware: cross-
compile
Cross Compilation
![Page 28: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/28.jpg)
Build toolchain must be built for specific target◦ gcc, glibc, binutils, etc.
Dan Kegel’s crosstool makes this easier: http://www.kegel.com/crosstool
Of the 3 Spec2000 programs we considered, we were only able to successfully cross compile gzip
Cross Compilation: Take 1
![Page 29: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/29.jpg)
Scour the net until you run across this link:◦ http://arch.cs.duke.edu/spec2000binaries.tar.bz2◦ All Spec200 binaries compiled for alpha-linux!
Cross Compilation: Take 2
![Page 30: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/30.jpg)
---------- Begin Simulation Statistics ----------
host_inst_rate 86899 # Simulator instruction rate (inst/s) host_mem_usage 543680 # Number of bytes of host
memory used host_seconds 0.07 # Real time elapsed on the host host_tick_rate 28827895 # Simulator tick rate (ticks/s) sim_freq 1000000000000 # Frequency of simulated ticks sim_insts 5997 # Number of instructions simulated sim_seconds 0.000002 # Number of seconds simulated sim_ticks 2005326 # Number of ticks simulated system.cpu0.dtb.accesses 0 # DTB accesses system.cpu0.dtb.acv 0 # DTB access violations system.cpu0.dtb.hits 0 # DTB hits system.cpu2.num_refs 1960 # Number of
memory references :
M5 Output
•M5 produces simulation results at end:
![Page 31: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/31.jpg)
We want IPC trace every 1 million cycles So we patched:
Getting M5 to Output Trace
• diff -Naur src/cpu/o3/cpu.cc /Users/klange/src/thirdparty/m5_2.0b4/src/cpu/o3/cpu.cc• --- src/cpu/o3/cpu.cc 2007-11-01 19:13:05.000000000 -0600• +++ /Users/klange/src/thirdparty/m5_2.0b4/src/cpu/o3/cpu.cc 2007-12-01 22:54:38.000000000 -0700• @@ -422,6 +422,21 @@• • ++numCycles;• • + ++totalCycles; // we could use numCycles...if only i could figure out how to stringificate• + ++currentCycles;• + if (currentCycles >= 1000000) {• + double currentIpc = (double)currentCommittedInsts / (double)currentCycles;• +• + cout << "IPC: "• + << totalCycles << ","• + << totalCommittedInstsInt << ","• + << currentIpc << std::endl;• +• + currentCommittedInsts = 0;• + currentCycles = 0;• + }• +• +• // activity = false;• • //Tick each of the stages• @@ -452,8 +467,10 @@• if (removeInstsThisCycle) {
![Page 32: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/32.jpg)
Build the processor core
![Page 33: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/33.jpg)
EV5 configuration on M5
![Page 34: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/34.jpg)
EV6 configuration on M5
![Page 35: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/35.jpg)
Goal: duplicate experiment in paper (peer-reviewed)
2-phase simulation◦1) Obtain IPC trace values for Spec2000
programs Using M5 simulator Alpha EV5 + EV6 cores
◦2) Use our own simulator to model various heterogeneous CMP configurations and evaluate assignment policies
Simulation Approach
![Page 36: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/36.jpg)
Spec 2000 Paper:
◦ - gzip◦ - gcc◦ crafty (chess program)◦ parser (Natural language processor)◦ bzip2◦ wupwis (quantum chromdynamics)◦ swim (shallow water modeling)◦ mgrid (multi-grid solver in 3d potential field)◦ galgel (fluid dynamics modeling)◦ equake (earthquake modeling)◦ lucas (prime number test)
Us:◦ gzip◦ Bzip2◦ crafty
Choosing Workload
![Page 37: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/37.jpg)
Spec 2000 input is proprietary Compromise:
◦ gzip/bzip2 input: Shakespeare plays◦ crafty input: sample chess game
Workload Input
![Page 38: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/38.jpg)
Obtained from M5
IPC Traces
![Page 39: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/39.jpg)
IPC Traces
![Page 40: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/40.jpg)
IPC Traces
![Page 41: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/41.jpg)
java Modular design Core simulator module Common thread-assignment policy interface Policy modules
Static Round Robin (dynamic) IPC-Driven (dynamic)
CMP Simulator
![Page 42: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/42.jpg)
Command-line interface◦ Example: CMPSim spec2000 10 2 1 roundrobin
Input:◦ Workload◦ Number of threads
Selected randomly from 3 Spec 2000 programs◦ # EV5 cores◦ # EV6 cores◦ Thread assignment policy
CMP Simulator
![Page 43: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/43.jpg)
Output:
CMP Simulator
Threads,Experiment,System IPC1,20EV5 RR,0.9050977847675382,20EV5 RR,1.461270365117883,20EV5 RR,2.062440678690534,20EV5 RR,2.785906338609815,20EV5 RR,3.353738438981526,20EV5 RR,4.072995790685577,20EV5 RR,4.174490205113648,20EV5 RR,4.9159374259,20EV5 RR,5.4738372761363610,20EV5 RR,6.0009047619318211,20EV5 RR,6.6482488852272712,20EV5 RR,7.2646014659090913,20EV5 RR,7.9047740170454514,20EV5 RR,8.4654566539772715,20EV5 RR,9.2339358454545516,20EV5 RR,9.8010424846590917,20EV5 RR,10.3671315159091
![Page 44: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/44.jpg)
IPC data are temporal sequences
CMP Simulator Issue
![Page 45: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/45.jpg)
Randomly assign threads to cores at startup Repeat process whenever core becomes idle Weaknesses:
◦ When one core becomes idle, it will persist in that state unless some unassigned thread exists.
◦ In the case of a heterogeneous system, this results in underutilization of "faster" cores.
◦ Execution of "slow" threads on "slower" cores may penalize overall system performance.
Static Policy
![Page 46: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/46.jpg)
Randomly assign threads to cores at startup Define swap_period
Experimentally, swap_period = 20M cycles works well if (current_cycle % swap_period == 0)
◦ Migrate thread from EV6 -> wait queue◦ Migrate thread from EV5 -> EV6◦ Migrate thread from wait queue -> EV6
When core becomes idle, assign longest-waiting thread
Round Robin Policy
![Page 47: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/47.jpg)
Costs◦ Inter-core context switch
PC, registers, etc must be transferred◦ Cache warmup
Simple model◦ switch_loss: 50%◦ switch_duration: 1M cycles
Modeling Thread Migration
![Page 48: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/48.jpg)
No effort is made to optimize thread-to-core mapping
Round Robin Weakness
![Page 49: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/49.jpg)
Optimize thread-to-core mapping• Define IPC ratio = EV6 IPC / EV5 IPC Heuristic: threads with highest IPC ratio are
assigned to EV6 System must compute average IPC for each
core type Requires forced migrations
To handle IPC spikes, use a weighted average:◦ Current IPC * 0.65 + Previous IPC * 0.35
IPC-Driven Policy
![Page 50: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/50.jpg)
Randomly assign threads to cores at startup Again, define swap_period
Experimentally, swap_period = 20M cycles works well if (current_cycle % swap_period == 0)
◦ Sort threads by weighted IPC ratio◦ Migrate accordingly
When core becomes idle, assign thread from wait queue with highest IPC ratio
IPC-Driven Policy
![Page 51: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/51.jpg)
Verifying Simulator
![Page 52: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/52.jpg)
Goal: verify results of paper Repeat their experiments
Experiments
![Page 53: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/53.jpg)
Policy Comparison◦ Static vs Round Robin vs IPC-Driven◦ Heterogeneous system: 5 x EV5, 3 x EV6
Experiment #1
![Page 54: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/54.jpg)
Expected Policy Results
![Page 55: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/55.jpg)
Actual Policy Results
![Page 56: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/56.jpg)
Heterogeneous vs. Homogenous System• Let 1 EV6 = 5 EV5
Based on die areas Configurations
◦ 20 EV5◦ 10 EV5, 2 EV6◦ 5 EV5, 3 EV6◦ 4 EV6
Experiment #2
![Page 57: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/57.jpg)
Expected Heterogeneous Results
![Page 58: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/58.jpg)
Actual Heterogeneous Results
![Page 59: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/59.jpg)
Simulator neglects L2 cache contention! Simplified thread migration model Only used 3 spec 2000 programs
◦ Paper used 11 Didn't have access to spec 2000 inputs Our EV5 and EV6 configurations were not
perfect◦ Lack of M5 documentation made this difficult
Experiment Limitations
![Page 60: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/60.jpg)
Google Code◦ Source Control◦ Wiki
Project Organization
![Page 61: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/61.jpg)
Confirmed dynamic thread assignment outperforms static thread assignment
Unable to confirm heterogeneous outperforms homogenous◦ Limitations of minimal Spec 2000 workload
Learned how to design complex, peer-reviewed experiment
Conclusion
![Page 62: Kris Lange Nopparat suwaanarat Pree Thiengburanathum.](https://reader030.fdocuments.us/reader030/viewer/2022032707/56649e4b5503460f94b404fc/html5/thumbnails/62.jpg)
Questions?