ROEE SHRAGA and AVIGDOR GAL, arXiv:2109.07321v1 [cs.DB] 15 ...
AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized...
-
Upload
meghan-sullivan -
Category
Documents
-
view
215 -
download
1
Transcript of AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized...
AMIR RACHUMCHAI RONEN
FINAL PRESENTATION
INDUSTRIAL SUPERVISOR:DR. ROEE ENGELBERG, LSI
Optimized Caching Policies for Storage Systems
System data is stored over different types of storage devices
Generally speaking, in data storage, for a given price, the higher the speed, the lower the volume
The idea is enable use of larger, low-cost disk space with the benefits of high-speed hardware-optimize data storage for fastest overall disk access
This requires a dynamic algorithm for managing (migrating) the data across the tiers.
Introduction – Storage Tiering
SSDHigh Cost
High PerformanceLow Volume
SATA DriveLow Cost
Low PerformanceHigh Volume
Goals
Creating a platform which will allow us to test different algorithms in system-specific scenarios.
Testing several algorithms and finding the optimal algorithm amongst them for storage tiering in different scenarios.
Methodology
We coded a simulator that represents the platform running the tiered storage system.
We created several data structures that represent the data on the system, its location at all times, record read/write operations, and several other unique features
We used a recording of real I/O calls for such a system to simulate an actual scenario.
Accomplishments
Created an Algorithm interface that supports any algorithm, multiple tiers and multiple platform data structures.
Our design is generic enough to enable very easy addition of usage statistics and platform data.
CLI enabled quick input of input file, chunk size, tiers information.
Varying chunk size let us research the effect of the size on run time and algorithm effectiveness.
We implemented 2 caching algorithms: A “naïve” algorithm that transfers every chunk to the top tier upon IO A more efficient algorithm that minimizes migrations
Smart implementation resulted in low disk space usage for the various data structures (used a default tier).
Algorithm conclusions
We ran 3 different scenarios: Small chunk size (16B), small SSD size (64B, *4 chunk
size) Large chunk size (2048B), (relatively) small SSD
size( 8196B, *4 chunk size) Small chunk size (16B), relatively large SSD size ( 8196B,
*512 chunk size)
Algorithm conclusions
When using extremely small SSD size (*4 chunk size), both caching algorithms are ineffective: The naïve one showed a high number of reads from
higher tier, yet had twice as many migrations between tiers
The smart algorithm, despite having half the migrations of the naïve algorithm, showed very little reading from higher tier.
In this case, the dummy algorithm proved very efficient, as it saved all the time needed for relatively useless migrations.
Algorithm Conclusions (16/64)
SATA/RSATA/WSSD/RSSD/WSATA --> SSDSSD --> SATA0
20000
40000
60000
80000
100000
120000
140000
160000
DummyCacheLruNaiveCacheLruSmart
Algorithm conclusions
When running with a large chunk size and *4 SSD size, the caching algorithms received much better results than the dummy algorithm. However, the 2 caching algorithms did not differ in between themselves.
Algorithm Conclusions (2048/8192)
SATA/RSATA/WSSD/RSSD/WSATA --> SSDSSD --> SATA0
2000
4000
6000
8000
10000
12000
DummyCacheLruNaiveCacheLruSmart
Algorithm conclusions
Running with a small chunk size and a large SSD size, the 2 caching algorithms also gave similar results. However, they were far inferior to the results from the previous run.
Algorithm Conclusions (16/8192)
SATA/RSATA/WSSD/RSSD/WSATA --> SSDSSD --> SATA0
20000
40000
60000
80000
100000
120000
140000
160000
DummyCacheLruNaiveCacheLruSmart
General Conclusions
Chunk size greatly affects the runtime of the platform, but “standard” size does not take long to run.
Smart usage of Boost greatly decreases work and is very effective.
Good implementation can result in huge disk space saving.
Despite having data structures in the platform, most non-naïve algorithms also need their own data structure of some sort
Working with Git source control proved to be very helpful: Retrieving old code that was once thought to be obsolete . Collaboration.