15/02/2006 CHEP 06 1
Measuring Quality of Service on Worker Node in Cluster
Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division, BARC, Mumbai, India
Helge Mainhard, Tony Cass, Olof Barring, CERN Geneva, Switzerland
15/02/2006 CHEP 06 2
INTRODUCTION
Quality of Service Defines goodness of a node for a type
of task Needed for better/optimum utilization
of resources Computer Division, BARC and IT
Division CERN collaborated to explore ways to predict QoS
15/02/2006 CHEP 06 3
QoS – Definition
QoS defines, how better the node is for a given task
QoS relates execution times like this
QoS varies between 0 to 1
QoS
TT noloadexecution
Texecution = Wall clock execution time for any taskTnoload = Wall clock execution time of the task on a given node without loadQoS = Quality of Service
15/02/2006 CHEP 06 4
Methodology
Three task categories CPU intensive Disk IO intensive Network IO intensive
Representative probe programs for each category
Load generating program for each category
15/02/2006 CHEP 06 5
Methodology
Monitor system metrics Load avg, CPU utilization, Memory utilization,
disk utilization, swap utilization etc. Execute probe programs in different load
conditions (generated using load generating programs)
Correlate probe execution time, system metrics and no load execution time of probe
15/02/2006 CHEP 06 6
Probe Selection
Probe should Represent real world applications Have less execution time Non-interactive
Selected probes are Linpack for CPU intensive Bonnie for Disk IO intensive Network IO intensive (not considered)
15/02/2006 CHEP 06 7
Load Generating programs
Generate load in given category Should have large execution time Feature for varying the load
Two type of Disk IO load Block IO (IO in large data blocks) Character IO (IO in small data blocks)
15/02/2006 CHEP 06 8
SETUP
32 node cluster Each node consists of
[email protected] GHz 640 MB memory 40 GB HDD Redhat Linux version 7.3
EDG Fabric Monitoring System for gathering system metrics
15/02/2006 CHEP 06 9
CPU Probe
CPU probe in different loading conditions Correlation using load average
Execution time varies linearly with load average
Problem in block IO load
eLoadAveragQoS
1
1(Equation 1)
15/02/2006 CHEP 06 10
CPU Probe
Execution Time vs 1 Min. Load Average
0
100
200
300
400
500
600
700
0 500 1000 1500 2000
1 min. Load Average x 100
CPU Load
Char IO Load
Block IO Load
15/02/2006 CHEP 06 11
CPU Probe
Load average represents combined CPU and IO load
CPU probe depends only on CPU load
Two ways to achieve it Average CPU load (VmStatR) Calculate available CPU to probe
15/02/2006 CHEP 06 12
CPU Probe
Average CPU Load 1 minute running average of run queue Called VmStatR Predicted QoS will be
VmStatRQoS
1
1(Equation 2)
15/02/2006 CHEP 06 13
CPU Probe
Execution Time vs VmStatR
0
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200 1400 1600
VmStatR x 100
CPU Load
Char IO Load
Block IO Load
15/02/2006 CHEP 06 14
CPU Probe
Available CPU to probe Calculate using CPU utilization metric Probe is eligible for
Available Idle time A share of System and User time
100
11
VmStatR
SystemTime
VmStatR
UserTimeIdleTime
QoS (Equation 3)
15/02/2006 CHEP 06 15
CPU Probe
Table shows the comparison between QoS predicted using equation 1 & 3 in Block IO load
QoS using Eq. 3 shows correct characteristic
QoS using Equation 1 QoS using Equation 3 Execution Time (Sec)
0.2433 0.4300487 32
0.1605 0.4375441 31
0.1329 0.4624468 32
0.1136 0.415 30
0.1042 0.4536079 31
0.0952 0.4290476 30
0.0869 0.4430435 31
15/02/2006 CHEP 06 16
Comparison of results
Compare the QoS results obtained using the three equations for CPU probe in different loads Equation 1 does not give correct
prediction in block IO load conditions Equation 2 & 3 give acceptable results
in any load condition
15/02/2006 CHEP 06 17
CPU Probe – Comparison of results
Compar ison of the Measured and Predicted Exec T ime
f or CPU Probe
0
20
40
60
80
100
120
140
160
180
1 2 3 4
LC +LB LC LC +LC h
LC h+LB
Measur ed E xec T ime
E quation 1
E quation 2
E quation 3
LC – CPU LoadLC+LB – CPU + Block IO LoadLC + LCh – CPU + Character IO LoadLCh + LB – Character + Block IO Load
15/02/2006 CHEP 06 18
Disk IO Probe
Modified ‘Bonnie’ to perform both as block IO and character IO probe
Considered block IO probe as most of the applications were block IO intensive
Correlate execution time probe under different loading conditions
Predicted QoS using the three equations and compared results
15/02/2006 CHEP 06 19
Disk IO Probe – Comparison of results
Compar ison of Measured and Predicted Execution T ime of
Block IO Probe
0
5
10
15
20
25
30
35
40
1 2 3 4LC +LB LC LC +LC h
LC h+LB
Measur ed E xec T ime
E quation 1
E quation 2
E quation 3
LC – CPU LoadLC+LB – CPU + Block IO LoadLC + LCh – CPU + Character IO LoadLCh + LB – Character + Block IO Load
15/02/2006 CHEP 06 20
CMSIM Results
Predicted execution time using QoS from Equation 2
% error against the measured one acceptable
Measured Execution Time (Sec) Predicted Execution Time (Sec) % Error
585 610.8687 4.422
739 744.3209 0.720017
929 934.466 0.588377
1082 1080.702 -0.11999
1230 1216.43 -1.10328
1413 1381.166 -2.25294
1687 1707.317 1.204332
15/02/2006 CHEP 06 21
Problem Areas
Effect of swapping If available memory is less than the size
of task Linux kernel dynamically changes the
priorities of tasks and swaps tasks accordingly
Difficult to predict QoS
15/02/2006 CHEP 06 22
Problem Areas – Swapping
V ar i ati on of M emor y, Swap and E xec T i me
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14
S ample No.
0
50
100
150
200
250
300
350
400
% Used Memory
% Used Swap
Exec T ime
15/02/2006 CHEP 06 23
Problem Areas
Metric sampling frequency of monitoring system Immediate metric value ensures better QoS
prediction At higher sampling frequency monitoring loads
the node Change in state after submission of task
QoS can’t consider load changes after submission of task
Submission/removal of other task may change QoS
15/02/2006 CHEP 06 24
Conclusion
Equation 2 & 3 provides better QoS for CPU bound applications
Equation 1 can be used for IO bound applications
Successfully predicted for CMSIM – It is mostly cpu bound job
Load balancing programs can use derived equations for job submissions
15/02/2006 CHEP 06 25
Thanks
Top Related