HiTS: A High Throughput Memory Scheduling Scheme to ...€¦ · Denial of Service Attack (DoS)...
Transcript of HiTS: A High Throughput Memory Scheduling Scheme to ...€¦ · Denial of Service Attack (DoS)...
HITS: A HIGH THROUGHPUT
MEMORY SCHEDULING SCHEME TO
MITIGATE DENIAL-OF-SERVICE
ATTACKS IN MULTI-CORE SYSTEMS
Mansour Shafaei and Yunsi Fei
Electrical and Computer Engineering
Northeastern University, Boston, MA
Outline
DoS attack in multicore systems
Background on DRAM memory
Related work
Our high-throughput DoS mitigation approach − HiTS
Experimental results
Conclusion
1 of 19
DRAM/Memory Controller in Multi-core Systems
Off-chip shared DRAM Multiple banks Unit for access - row
On-chip DRAM controller Bank-specific request buffers
DRAM scheduler Bank scheduler
Bus scheduler
2 of 19
DRAM Memory/DRAM Controller (Cont.)
Operations on memory banks to serve memory requests
depend on the memory address (row address)
✓ Row hit (column decoding)
₋ Row closed (row+column decoding)
× Row conflict (precharge and row+column decoding)
Traditional schedulers such as FR-FCFS (Row-hit First,
First-Come First-Serve) [S. Rixner, ISCA’00]
✓ Increasing DRAM throughput by prioritizing “Row hit” requests
over others
3 of 19
Denial of Service Attack (DoS)
Definition: An attempt to make a machine or network
resource unavailable to its intended users
Software
User applications
System applications
Network
Hardware
4 of 19
DoS Vulnerability in Multi-core Memory
Different threads exhibit different
Temporal locality (cache miss rate) – memory request demand
Spatial locality (row-buffer locality)
FR-FCFS is thread-oblivious Row hit first- First Come First Serve only considers the bank
status
✓ Best performance in single core machines
× Not the best (if not the worst) in multi-cores
Results in unfair distribution of DRAM servics It favors threads with high buffer localities but may starve threads with
low buffer localities - DoS
5 of 19
Previous Work – TCM (Thread Cluster
Memory Scheduling) [Y. Kim, Micro’10]Periodically clusters threads based on attained memory
service Threads in higher ranked cluster are ranked further based on
cache miss rate Ignorance of difference in row-buffer localities
Shuffles rank among threads in lower ranked cluster Overlooking the difference in memory demands
Prioritizes memory requests of higher ranked threads
over others× Not considering the row-buffer status
× Too many row-buffer conflicts – may hurt the system performance even though fairness
among threads is improved
6 of 19
Previous Work (Cont.)
Too many row-buffer conflicts due to frequent ranking
enforcement and service leakage On the border of time intervals
Ranking updates and enforcement
Within time intervals Service leakage
7 of 19
Row-Buffer Conflicts Overhead
Running two memory-intensive benchmarks
With low and high row-buffer localities
FR-FCFS
TCM
8 of 19
HiTS Scheduling Scheme
Ranking mechanism Periodically
Demand-Service ratio Explicitly considering the memory demands (cache miss rates)
Implicitly considering the row-buffer locality by taking into account the
attained memory service
Ranking enforcement Postponing memory service switches to the moments that the
least overhead is posed
9 of 19
HiTS Scheduling Scheme (Cont.)
Ranking enforcement The current running thread reaches the moment of row-buffer
conflict Bring in the top-ranked thread, with no additional row-buffer conflicts
To avoid starving the higher-ranked thread, preempt when the
higher ranked thread experiences excessive slowdowns Threshold metric: Highest ranked thread’s micro-operation execution
rate
May cause row-buffer conflicts, but balance fairness
10 of 19
HiTS: Example Comparing three schems
HiTS
FR-FCFS
TCM
HiTS
Less row-buffer conflicts than TCM
Preserving throughput
Similar fairness to TCM
11 of 19
Evaluations
Simulators MARSSX86 [A. Patel, DAC’11]
Cycle accurate X86 simulator
DRAMSim2 [P. Rosenfeld, Computer Architecture Letter’11] Cycle accurate DDRX simulator
Benchmark SPEC2006 benchmark suite
12 of 19
Evaluations (Cont.)
Profiling single benchmarks Cache miss rate
Row-buffer locality
Categorizing benchmarks to memory-intensive
vs. CPU-intensive based on the cache miss
rate
Making multi-thread workloads from memory-
intensive benchmarks but with different row-
buffer localities
13 of 19
Metrics
Ranking Cache miss rate (cache miss/K. instr.)
Attained memory service (Bandwidth usage) (# of served requests)
Run-time rank enforcement Micro-operation execution rate (IPC)
Evaluation Unfairness – DoS mitigation
Average Speedup - Throughput
14 of 19
Results for 8-Core
Finding the optimum execution rate threshold Changing from 1 to 4 (Commit width=4 IPC)
15 of 19
Results for 8-Core (Cont.)
Evaluating unfairness and speedup
16 of 19
Results for 16-Core and Comparison
Average unfairness reduction
Average speedup improvement
With respect to
FR-FCFS
With respect to
TCM
8-Core 28.4% 12.2%
16-Core 23.7% 6.3%
With respect to
FR-FCFS
With respect to
TCM
8-Core 19.6% 15.7%
16-Core 6.5% 15.2%
17 of 19
Conclusion
We propose HiTS – a high-throughput memory schedule
to mitigate DoS attacks in multi-core systems Ranking the threads based on their demand-service ratio Target allocate service proportionally to the demand and achieve
fairness
Separating ranking enforcements from ranking updates Poses the least amount of overheads to keep high throughput
Compared to FR-FCFS and TCM Better fairness and throughput
18 of 19
Questions
?
19 of 19
Backup
Machine Conf. and Benchmarks’
Characteristics
Workloads For 8-Core