AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized...

13
AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR : DR. ROEE ENGELBERG, LSI Optimized Caching Poli cies for Storage System s

Transcript of AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized...

Page 1: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

AMIR RACHUMCHAI RONEN

FINAL PRESENTATION

INDUSTRIAL SUPERVISOR:DR. ROEE  ENGELBERG, LSI

 Optimized Caching Policies for Storage Systems

Page 2: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

System data is stored over different types of storage devices

Generally speaking, in data storage, for a given price, the higher the speed, the lower the volume

The idea is enable use of larger, low-cost disk space with the benefits of high-speed hardware-optimize data storage for fastest overall disk access

This requires a dynamic algorithm for managing (migrating) the data across the tiers.

Introduction – Storage Tiering

SSDHigh Cost

High PerformanceLow Volume

SATA DriveLow Cost

Low PerformanceHigh Volume

Page 3: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Goals

Creating a platform which will allow us to test different algorithms in system-specific scenarios.

Testing several algorithms and finding the optimal algorithm amongst them for storage tiering in different scenarios.

Page 4: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Methodology

We coded a simulator that represents the platform running the tiered storage system.

We created several data structures that represent the data on the system, its location at all times, record read/write operations, and several other unique features

We used a recording of real I/O calls for such a system to simulate an actual scenario.

Page 5: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Accomplishments

Created an Algorithm interface that supports any algorithm, multiple tiers and multiple platform data structures.

Our design is generic enough to enable very easy addition of usage statistics and platform data.

CLI enabled quick input of input file, chunk size, tiers information.

Varying chunk size let us research the effect of the size on run time and algorithm effectiveness.

We implemented 2 caching algorithms: A “naïve” algorithm that transfers every chunk to the top tier upon IO A more efficient algorithm that minimizes migrations

Smart implementation resulted in low disk space usage for the various data structures (used a default tier).

Page 6: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Algorithm conclusions

We ran 3 different scenarios: Small chunk size (16B), small SSD size (64B, *4 chunk

size) Large chunk size (2048B), (relatively) small SSD

size( 8196B, *4 chunk size) Small chunk size (16B), relatively large SSD size ( 8196B,

*512 chunk size)

Page 7: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Algorithm conclusions

When using extremely small SSD size (*4 chunk size), both caching algorithms are ineffective: The naïve one showed a high number of reads from

higher tier, yet had twice as many migrations between tiers

The smart algorithm, despite having half the migrations of the naïve algorithm, showed very little reading from higher tier.

In this case, the dummy algorithm proved very efficient, as it saved all the time needed for relatively useless migrations.

Page 8: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Algorithm Conclusions (16/64)

SATA/RSATA/WSSD/RSSD/WSATA --> SSDSSD --> SATA0

20000

40000

60000

80000

100000

120000

140000

160000

DummyCacheLruNaiveCacheLruSmart

Page 9: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Algorithm conclusions

When running with a large chunk size and *4 SSD size, the caching algorithms received much better results than the dummy algorithm. However, the 2 caching algorithms did not differ in between themselves.

Page 10: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Algorithm Conclusions (2048/8192)

SATA/RSATA/WSSD/RSSD/WSATA --> SSDSSD --> SATA0

2000

4000

6000

8000

10000

12000

DummyCacheLruNaiveCacheLruSmart

Page 11: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Algorithm conclusions

Running with a small chunk size and a large SSD size, the 2 caching algorithms also gave similar results. However, they were far inferior to the results from the previous run.

Page 12: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Algorithm Conclusions (16/8192)

SATA/RSATA/WSSD/RSSD/WSATA --> SSDSSD --> SATA0

20000

40000

60000

80000

100000

120000

140000

160000

DummyCacheLruNaiveCacheLruSmart

Page 13: AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

General Conclusions

Chunk size greatly affects the runtime of the platform, but “standard” size does not take long to run.

Smart usage of Boost greatly decreases work and is very effective.

Good implementation can result in huge disk space saving.

Despite having data structures in the platform, most non-naïve algorithms also need their own data structure of some sort

Working with Git source control proved to be very helpful: Retrieving old code that was once thought to be obsolete . Collaboration.