Cloud workload analysis and simulation
-
Upload
prabhakar-ganesamurthy -
Category
Technology
-
view
254 -
download
2
Transcript of Cloud workload analysis and simulation
Cloud ComputingProject B Cloud workload analysis and simulation.
Group 3:Abinaya ShanmugarajArunraja SrinivasanPrabhakar GanesamurthyPriyanka Mehta
Instructor : Dr. I-Ling Yen
TA : Elham Rezvani
Overview
• Dataset preprocessing
• Dataset Analysis and Observations
• Important attributes in dataset
• Categorization of users and tasks
• Time series analysis
• Workload prediction
• Looking Ahead
Dataset pre-processing
• Inconsistent and vague data was processed to perform analysis.
• The task-usage table has many records for a same jobID-task index pair because the same task might be re-submitted or re-scheduled due to task failure.
• So to avoid reading many values for the same JobID-Task index pair pre-processing was done.
• Pre-processing: All records were grouped by JobID-Task index and the last occurring record of repeating task records was considered and stored as a single record.
• Time is in microseconds in the dataset.
• Pre-processing: Time is converted into days and hours for per day analysis
Dataset pre-processing
Dataset analysis and observation
• The data in the tables were visualized
• The data which were found to be constant/within a small range of values for most of the records were not considered for analysis.
• The attributes that play a major part in shaping the user profile and task profile are considered important attributes.
• The main attributes from a table were analyzed and visualized and certain observations were made.
Data Analysis and Observation
Ignored attribute(example) – Memory accesses per instruction
Memory accesses per instruction Vs Tasks per JobID – Except for a few tasks MAI is almost the same for all tasks
Job Events tableAttributes considered: Time, JobID, event type, user.
• These attributes were extracted from the csv files using java code.
• To find the number of jobs submitted per day and per user, the records with event type = 0 were considered, as ‘0’ means a job is submitted by the user.
• Time in microseconds is converted into days
Visualizations : jobs submitted per day, per user.
Task events tableAttributes considered: Time, JobID, task index,event type, user, CPU request,
memory request, disk space request.
• With records where event type = 0, the number of tasks per day, per user was visualized.
• Through the distinct count of users, the numbers of users per day was visualized
Average tasks per day = 1,607,694
Average users per day = 398
Visualizations: number of tasks per day, per user, number of users per day, user submission rate (total number of tasks submitted/30) average memory requested per user, average CPU requested per user, Avg tasks/job per user.
Tasks per day Vs Jobs per dayDay
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0M
1M
2M
3M
4M
5M
Cou
nt o
f Tas
k In
dex
0K
10K
20K
30K
40K
Dis
tinct
cou
nt o
f Job
ID
Sheet 7
C o u n t o f Ta s k In d e x a n d d is tin c t c o u n t o f J o b ID f o r e a c h D a y .
Observation: From the visualization, there is loose correlation between Jobs/day and Tasks/day. (Less jobs does not mean less number of tasks)
Tasks per day Vs Users per dayDay
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0M
1M
2M
3M
4M
5M
Cou
nt o
f Tas
k In
dex
0
100
200
300
400
500
Dis
tinct
cou
nt o
f Use
r
Sheet 1
C o u n t o f Ta sk In d e x a n d d is tin c t co u n t o f U se r f o r e a ch D a y .
Observation: From the visualization, there is loose correlation between Jobs/day and users/day. There is a pattern in users/day(Every week, 7th day has less number of users(possibly a weekend)). Type of users is important than number of users/day to predict the number of tasks/day
User Submission rate(Task/day)
Observation: Few users have very high submission rate.
Avg. Tasks/Job per user
Observation: Most jobs user submit are similar as the number of tasks in the jobs are same
Machine EventsAttributes considered: Time, machine ID, event type,
CPU, memory.
• Considering records with event type = 0, we get machines that are added to the cluster and are available
• Considering records with event type = 1, we get machines that are removed due to failure
• Considering records with event type = 2, we get the machines whose attributes are updated
• These data is of less significance for our project
Tasks usageAttributes considered: start time, end time, job ID, task
index, CPU rate, canonical memory usage, assigned memory usage, local disk space usage.
• Using the considered attributes, task length(running time*CPU rate) was computed. (running time was converted from microseconds to seconds)
• The user data from task events table was extracted to get the average memory, CPU used per user
Visualization: Average CPU used per user, Average memory used per user
CPU requested per user Vs CPU used per user
Observation: Most users over estimate the resources they need and use less than 5% of the requested resources A few users under estimate the resources and use more than thrice the amount of requested resources.
Memory requested per user Vs Memory used per user
Observation: Most users over estimated the resources they need and use less than 30% of the requested resources Very few users under estimated the resources and use more than the amount of requested resources
but when tasks use more memory than requested they get killed.
Important Attributes• Those attributes which play an important part in
identifying user and task shape
• From the visualizations and observations made, the following are identified as important attributes:
• User : Submission rate, CPU estimation ratio, Memory estimation ratio
Estimation ratio = (requested resource – used resource)/requested resource
• Task : Task length, CPU usage, Memory usage
CPU Estimation ratio per User
Users with negative (red) CPU estimation ratio have used resources more than requested.Users with CPU estimation ratio between 0.9 to 1 have not used more than 90% of the requested resource.
Memory Estimation ratio per User
Users with negative (orange) memory estimation ratio have used resources more than requested.Users with memory estimation ratio between 0.9 to 1 have not used more than 90% of the requested resource.
Categorization of Users
Categorization of Tasks
Dimensions for categorizationUser : Submission rate, CPU estimation ratio, Memory estimation ratioTask : Task length, CPU usage, Memory usage
We use the following clustering algorithms to identify optimal number of clusters for users and tasks1. K- means 2. Expectation – Maximization (EM)3. Cascade Simple K-means4. Xmeans• We categorize the users and tasks using these clustering algorithms with the above dimensions for users and tasks.• We compare and choose the best clustering for users and tasks.
User Categorization
Users - K- means with 4 clusters
X : Avg. memory est. ratio Y: Submission rate Z: Avg. CPU est. ratio
Tasks Categorization
Tasks – Day 13 – Kmeans (3 clusters)
X: Memory usage Y: Length Z: CPU usage
Tasks – Day 13 - Xmeans
X: Memory usage Y: Length Z: CPU usage
Clustering Comparison:
Our clustering(Xmeans)
K means clustering in done in IEEE paperAn Approach for Characterizing Workloads in Google Cloud to Derive RealisticResource Utilization Models
Selected User and Task clustering
Users - K means with 4 clustersX : Avg memory est. ratio Y: Submission rate Z: Avg. CPU est. ratio
Tasks - X means with 3 clustersX: Memory usage Y: Length Z: CPU usage
Time Series Analysis
Selecting Target Users & Tasks
From the clustering results we observed:• 97% of the users have estimation ratios ranging from 0.7-1.0• That is 97% of the users don’t user more than 70% of the resources they request• We targeted User Cluster 0 & Cluster 3 ( more than 90 % unused)
We targeted tasks that were long enough to perform efficient resource allocation• Performed clustering on task lengths of these users to filter out short tasks
User workload analysis – Dynamic Time Warping
To identify user’s tasks with similar workload,We ran the DTW algorithm on each tasks of Cluster0 and Cluster3 users• Computed the DTW between user’s tasks and a reference curve• Extracted tasks of a user that have same DTW value• These tasks were identified to have similar workload curve.
Workload prediction
Workload predictionSince resource allocation and de-allocation cannot be done dynamically because of :• Huge overhead• Delay in allocating resourcesSo the resource allocation must happen once in every pre-determined interval of time.
Prediction:• When a predictable user runs a task , its initial workload is compared with the curve associated(reference curve) with him/her.• Based on the slope of the predicted workload curve(reference curve) a step- up or step-down in resource allocation is determined, considering the delay in resource allocation.
Looking ahead…
• When the unhashed job name and user name is known, associations between job name and its workload can be formed and used for better prediction
• As observed in the user clustering, most users have poor estimation ratios.So better resource estimating processes can be used to assist users to have a better Estimation ratios.
• More techniques like regression analysis, curve fitting algorithms can be used to get a better representative curve for a predictable user.
நன்றி�