Effect of Context Aware Scheduler on TLB
description
Transcript of Effect of Context Aware Scheduler on TLB
![Page 1: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/1.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
1
Effect of Context Aware Scheduler on TLB
Satoshi Yamada
PhD Candidate
Kusakabe Laboratory
![Page 2: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/2.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
2
Contents
• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmark Applications and Measurement E
nvironment• Result• Related Works• Conclusion
![Page 3: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/3.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
3
widely spread multithreading
• Multithreading hides the latency of disk I/O and network access
• Threads in many languages, Java, Perl, and Python correspond to OS threads
* More context switches happen today* Process scheduler in OS is more responsible for the system performance
![Page 4: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/4.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
4
• Overhead of a context switch – includes that of loading a new working set for next
process– is deeply related with the utilization of caches
• Agarwal. etc “Cache performance of operating system and multiprogramming workloads” (1988)
• Mogul, et al. “The effect of of context switches no cache performance” (1991)
Context Switch and Caches
Process A Process B
Process A
Cache
Switch
Process A
Process BSwitch
Working setsoverflows the cache
Process B
A only
B only
A and B
![Page 5: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/5.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
5
Advantage of Sibling Thread
mm
signal
file..
mm
signal
file..
fork()mm_struct
signal_struct
task_struct
create a PROCESS create a THREAD
task_struct
OS does not have to switch memory address spaces in switch sibling threads
signal_struct
.
.
we can expect the reductionof the overhead of context switch
Parent Parenttask_struct
mm
signal
file..
copy
mm_struct
signal_struct
.
.
Child
mm
signal
file..
share
clone()mm_struct task_struct
signal_struct
.
...
Child
Sibling Threads
![Page 6: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/6.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
6
Contents
• Introduction• Overhead of Context Switch and TLB• Context Aware Scheduler• Benchmark Applications and Measurement E
nvironment• Result• Related Works• Conclusion
![Page 7: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/7.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
7
TLB flush in Context Switch
• TLB is a cache which stores the translation from virtual addresses into physical address– TLB translation latency: 1 ns– TLB miss overhead: several accesses to memory
• On x86 processors, most of TLB entries are invalidated (flushed) in every context switch by changing memory address space
TLB flush does not happen in the context switch among sibling threads
![Page 8: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/8.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
8
Overhead due to a context switch
by lat_ctx in LMbench
![Page 9: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/9.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
9
Contents
• Introduction• Overhead of Context Switch and TLB• Context Aware Scheduler• Benchmark Applications and Measurement E
nvironment• Result• Related Works• Conclusion
![Page 10: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/10.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
10
O(1) Scheduler in Linux
• O(1) scheduler runqueue has– active queue and expired que
ue– priority bitmap and array of lin
ked list of threads
• O(1) scheduler – searches priority bitmap– chooses a thread with the hig
hest priority
Scheduling overhead is independent of the number of threads
![Page 11: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/11.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
11
Context Aware (CA) Scheduler
• CA scheduler creates auxiliary runqueues per group of threads
• CA scheduler compares Preg and Paux• Preg: the highest priority in regular O(1) scheduler runqueue• Paux: the highest priority in the auxiliary runqueue
• if Preg - Paux <= threshold, then we choose Paux
![Page 12: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/12.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
12
Context Aware Scheduler
Linux O(1) scheduler
Context switches between processes: 3 times
Context switches between processes: 1 time
CA scheduler
A C DB E
A C D B E
![Page 13: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/13.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
13
Fairness
• O(1) scheduler keeps the fairness by epoch– cycles of active queue and
expired queue
• CA scheduler also follows epoch – guarantee the same level o
f fairness as O(1) scheduler
![Page 14: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/14.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
14
Contents
• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmarks and Measurement Environment• Result• Related Works• Conclusion
![Page 15: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/15.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
15
Benchmarks• Java
– Volano Benchmark (Volano)– lusearch program in DaCapo benchmark suite (D
aCapo)
• C– Chat benchmark (Chat)– memory program in SysBench benchmark suite
(SysBench)Information of Each Benchmark Applications
![Page 16: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/16.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
16
Measurement Environment
• Hardware
• Sun’s J2SE 5.0• threshold of context aware scheduler
– 1 and 10
• Perfctr to count the TLB misses• GNU’s time command to measure the total system
performance
![Page 17: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/17.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
17
Contents
• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmarks and Measurement Environment• Result• Related Works• Conclusion
![Page 18: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/18.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
18
Effect on TLB
• CA scheduler significantly reduces TLB misses• Bigger threshold is more effective
– frequent changes of priority by dynamic priority especially in DaCapo and Volano
Results of TLB misses (million times)
![Page 19: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/19.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
19
Effect on System Performance
Results by time command (seconds)
Results of the Counters in Each Application(seconds)
CA scheduler • enhances the throughput on every application• reduces the total elapsed time by 43%
![Page 20: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/20.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
20
Contents
• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmarks and Measurement Environment• Result• Related Works• Conclusion
![Page 21: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/21.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
21
H. L. Sujay Parekh, et. al,“Thread Sensitive Scheduling for SMT Process
ors” (2000)
• Parekh’s scheduler– tries groups of threads to execute in parallel a
nd sample the information about• IPC• TLB misses• L2 cache misses, etc
– schedules on the information sampled
Sampling Phase Scheduling Phase Sampling Phase Scheduling Phase
![Page 22: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/22.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
22
Pranay Koka, et. al, “Opportunities for Cache Friendly Process” (2
005)
• Koka’s scheduler– traces the execution of each thread– put the focus on the shared memory space
between threads– Schedule on the information above
Tracing Phase Scheduling Phase Tracing Phase Scheduling Phase
![Page 23: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/23.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
23
Conclusion
• Conclusion– CA scheduler is effective in reducing TLB misses– CA scheduler enhances the throughput of every
application
• Future Works– Evaluation on other platforms– Investigation of fairness among an epoch
• compare with Completely Fair Scheduler (Linux 2.6.23)
![Page 24: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/24.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
24
widely spread multithreading
• Multithreading hides the latency of disk I/O and network access
• Threads in many languages, Java, Perl, and Python correspond to OS threads
ThreadA ThreadB
disk
* More context switches happen today* Process scheduler in OS is more responsible for the system performance
ThreadB waits
![Page 25: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/25.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
25
Context Aware (CA) scheduler
A C DB E
A C D B E
Linux O(1) scheduler
CA scheduler
Context switches between processes: 3 times
Context switches between processes: 1 time
Our CA scheduler aggregates sibling threads
![Page 26: Effect of Context Aware Scheduler on TLB](https://reader035.fdocuments.us/reader035/viewer/2022062803/56814680550346895db3a19c/html5/thumbnails/26.jpg)
3rd Joint Workshop on Embedded and Ubiquitous Computing
26
Process A
Process C
Results of Context Switch
L2 cache size: 2MB
(micro seconds)
Process BCache 0
1MB
2MB