HiTS: A High Throughput Memory Scheduling Scheme to ...€¦ · Denial of Service Attack (DoS)...

HITS: A HIGH THROUGHPUT

MEMORY SCHEDULING SCHEME TO

MITIGATE DENIAL-OF-SERVICE

ATTACKS IN MULTI-CORE SYSTEMS

Mansour Shafaei and Yunsi Fei

Electrical and Computer Engineering

Northeastern University, Boston, MA

Outline

DoS attack in multicore systems

Background on DRAM memory

Related work

Our high-throughput DoS mitigation approach − HiTS

Experimental results

Conclusion

1 of 19

DRAM/Memory Controller in Multi-core Systems

Off-chip shared DRAM Multiple banks Unit for access - row

On-chip DRAM controller Bank-specific request buffers

DRAM scheduler Bank scheduler

Bus scheduler

2 of 19

DRAM Memory/DRAM Controller (Cont.)

Operations on memory banks to serve memory requests

depend on the memory address (row address)

✓ Row hit (column decoding)

₋ Row closed (row+column decoding)

× Row conflict (precharge and row+column decoding)

Traditional schedulers such as FR-FCFS (Row-hit First,

First-Come First-Serve) [S. Rixner, ISCA’00]

✓ Increasing DRAM throughput by prioritizing “Row hit” requests

over others

3 of 19

Denial of Service Attack (DoS)

Definition: An attempt to make a machine or network

resource unavailable to its intended users

Software

User applications

System applications

Network

Hardware

4 of 19

DoS Vulnerability in Multi-core Memory

Different threads exhibit different

Temporal locality (cache miss rate) – memory request demand

Spatial locality (row-buffer locality)

FR-FCFS is thread-oblivious Row hit first- First Come First Serve only considers the bank

status

✓ Best performance in single core machines

× Not the best (if not the worst) in multi-cores

Results in unfair distribution of DRAM servics It favors threads with high buffer localities but may starve threads with

low buffer localities - DoS

5 of 19

Previous Work – TCM (Thread Cluster

Memory Scheduling) [Y. Kim, Micro’10]Periodically clusters threads based on attained memory

service Threads in higher ranked cluster are ranked further based on

cache miss rate Ignorance of difference in row-buffer localities

Shuffles rank among threads in lower ranked cluster Overlooking the difference in memory demands

Prioritizes memory requests of higher ranked threads

over others× Not considering the row-buffer status

× Too many row-buffer conflicts – may hurt the system performance even though fairness

among threads is improved

6 of 19

Previous Work (Cont.)

Too many row-buffer conflicts due to frequent ranking

enforcement and service leakage On the border of time intervals

Ranking updates and enforcement

Within time intervals Service leakage

7 of 19

Row-Buffer Conflicts Overhead

Running two memory-intensive benchmarks

With low and high row-buffer localities

FR-FCFS

TCM

8 of 19

HiTS Scheduling Scheme

Ranking mechanism Periodically

Demand-Service ratio Explicitly considering the memory demands (cache miss rates)

Implicitly considering the row-buffer locality by taking into account the

attained memory service

Ranking enforcement Postponing memory service switches to the moments that the

least overhead is posed

9 of 19

HiTS Scheduling Scheme (Cont.)

Ranking enforcement The current running thread reaches the moment of row-buffer

conflict Bring in the top-ranked thread, with no additional row-buffer conflicts

To avoid starving the higher-ranked thread, preempt when the

higher ranked thread experiences excessive slowdowns Threshold metric: Highest ranked thread’s micro-operation execution

rate

May cause row-buffer conflicts, but balance fairness

10 of 19

HiTS: Example Comparing three schems

HiTS

FR-FCFS

TCM

HiTS

Less row-buffer conflicts than TCM

Preserving throughput

Similar fairness to TCM

11 of 19

Evaluations

Simulators MARSSX86 [A. Patel, DAC’11]

Cycle accurate X86 simulator

DRAMSim2 [P. Rosenfeld, Computer Architecture Letter’11] Cycle accurate DDRX simulator

Benchmark SPEC2006 benchmark suite

12 of 19

Evaluations (Cont.)

Profiling single benchmarks Cache miss rate

Row-buffer locality

Categorizing benchmarks to memory-intensive

vs. CPU-intensive based on the cache miss

rate

Making multi-thread workloads from memory-

intensive benchmarks but with different row-

buffer localities

13 of 19

Metrics

Ranking Cache miss rate (cache miss/K. instr.)

Attained memory service (Bandwidth usage) (# of served requests)

Run-time rank enforcement Micro-operation execution rate (IPC)

Evaluation Unfairness – DoS mitigation

Average Speedup - Throughput

14 of 19

Results for 8-Core

Finding the optimum execution rate threshold Changing from 1 to 4 (Commit width=4 IPC)

15 of 19

Results for 8-Core (Cont.)

Evaluating unfairness and speedup

16 of 19

Results for 16-Core and Comparison

Average unfairness reduction

Average speedup improvement

With respect to

FR-FCFS

With respect to

TCM

8-Core 28.4% 12.2%

16-Core 23.7% 6.3%

With respect to

FR-FCFS

With respect to

TCM

8-Core 19.6% 15.7%

16-Core 6.5% 15.2%

17 of 19

Conclusion

We propose HiTS – a high-throughput memory schedule

to mitigate DoS attacks in multi-core systems Ranking the threads based on their demand-service ratio Target allocate service proportionally to the demand and achieve

fairness

Separating ranking enforcements from ranking updates Poses the least amount of overheads to keep high throughput

Compared to FR-FCFS and TCM Better fairness and throughput

18 of 19

Questions

?

19 of 19

Backup

Machine Conf. and Benchmarks’

Characteristics

Workloads For 8-Core

HiTS: A High Throughput Memory Scheduling Scheme to ...€¦ · Denial of Service Attack (DoS)...

Documents

Transcript of HiTS: A High Throughput Memory Scheduling Scheme to ...€¦ · Denial of Service Attack (DoS)...