S M Faisal* Hash in a Flash: Hash Tables for Solid State Devices Tyler Clemons*Shirish Tatikonda ‡...

29
S M Faisal* Hash in a Flash: Hash Tables for Solid State Devices Tyler Clemons* Shirish Tatikonda Charu Aggarwal Srinivasan Parthasarathy* *The Ohio State University. Columbus, Ohio IBM Almaden Research Center. San Jose, California IBM T.J. Watson Center. Yorktown Heights, New York

Transcript of S M Faisal* Hash in a Flash: Hash Tables for Solid State Devices Tyler Clemons*Shirish Tatikonda ‡...

S M Faisal*

Hash in a Flash:Hash Tables for Solid State Devices

Tyler Clemons* Shirish Tatikonda ‡

Charu Aggarwal† Srinivasan Parthasarathy*

*The Ohio State University. Columbus, Ohio‡IBM Almaden Research Center. San Jose, California†IBM T.J. Watson Center. Yorktown Heights, New York

2

Motivation and Introduction

Data is growing at a fast pace Scientific data, Twitter, Facebook,

Wikipedia, WWW

Traditional Data Mining and IR algorithms require random out-of-core data access Often Data is too large to fit in memory

thus frequent random disk access is expected

11/30/2007

3

Motivation and Introduction (2)

Traditional Hard Disk Drives can keep pace with storage requirements but NOT random access workloads Moving parts are physical limitations Also contribute to rising energy consumption

Flash Devices have emerged as an alternative Lack moving parts

Faster Random Access Lower energy usage But they have several drawbacks….

11/30/2007

4

Flash Devices

Limited Lifetime Supports limited number of rewrites

Also known as erasures or cleans. Impacts response time

These are incurred at the block level. Blocks consist of pages. Pages (4kb-8kb) are the

smallest I/O unit Poor Random Write Performance

Incurs many erasures and lowers lifetime Efficient sequential write performance

Lowers erasures and increases lifetime11/30/2007

5

On Flash Devices, DM, and IR

Flash Devices provide fast random read access Common for many IR and DM algorithms and data

structures Hash Tables are common in both DM and IR

Useful for associating keys and values Counting Hash Tables associate keys with a frequency

This is found in many algorithms that track word frequency We will examine one such algorithm common in both DM

and IR (TF-IDF) They exhibit random access for writes and reads

Random Writes are an issue for Flash Devices

11/30/2007

6

Hash Tables for Flash Devices must:

Reduce erasures/cleans and Reduce random writes to SSD Batch updates

Maintain reasonable query times Data Structure must not incur unreasonable

disk overhead Nor should it require unreasonable memory

restraints

11/30/2007

7

Our approach

Our approach makes two key contributions: Optimize our designs for a counting hash table.

This has not been done by the previous approaches (A. Anand ’10), (D. Andersen ’09) , (B. Debnath,

’10) , (D. Zelinalipour-Yatzi ’05) The Primary Hash Table resides on the Flash

Device. Many designs use the SSD as a cache to the HDD

(D. Andersen ’09) (B. Debnath, ’10) Anticipate data sets with high random access and

throughout requirements

11/30/2007

8

Hash Tables for Flash Devices must:

Reduce erasures/cleans and Reduce random writes to SSD Batch updates

Create In Memory StructureTarget semi-random updates or block level updates

Maintain reasonable query times Data Structure must not incur unreasonable disk

overheadCarefully index keys on disk

Nor should it require unreasonable memory restraintsMemory requirement is at most fixed parameter

9

Memory Bounded(MB) Buffering

Updates are Hashed into a bucket in the RAM

Updates are quickly combined in memory

(64,2)

(12,7)

When full, batch updates to corresponding Disk Buckets

If Disk Buckets are full, invoke overflow region

10

Memory Bounded(MB) Buffering

Two way Hash On-Disk Closed Hash

Table Hash at page level Update via block level Linear Probing for

collisions In memory Open Hash

table Hash at block level Combine updates Flush with merge()

operation Overflow segment

Closed Hash table excess

11/30/2007

11

Can we improve MB?

Reduces number of write operations to flash device Batch Updates only when memory buffer is full Updates are semi-random (Key,Value) changes are maintained in memory

Query times are reasonable Memory buffer search is fast Relatively fast SSD random access and linear probing (See

Paper) Prefetch pages

MB has disadvantages Sequential Page Level operations are preferred

Fewer block updates Limited by the amount of available memory

Think large disk datasets. Updates may be numerous

11/30/2007

12

Introduce an On Disk Buffer

Batch updates from memory to disk are page level Reduce expensive block level writes (time and

cleans) Increase Sequential writes

Increase buffering capability Reduce expensive non semi-random Block Updates May decrease cleans

Search space increases during queries Incurred only if inserting and reading concurrently However, less erasure time will decrease latency

11/30/2007

13

On Disk Buffering

Change Segment (CS) Sequential Log Structure sequential writes

stage() operation Flushes memory to CS Fast Page Level Operations

merge() operation Invoked when CS is full Combines CS with Data Segment Less frequent than stage()

What is the structure of the CS?

11/30/2007

14

Change Segment Structure v1

Buckets are assigned specific Change Segment Buckets.Change Segment Buckets are shared by multiple RAM buffer buckets.

15

Memory Disk Bounded Buffer (MDB)

Associate a CS block to k data blocks

Semi random writes Only merge() full CS

blocks Frequently updated

blocks may incur numerous (k-1) merge() operations

Query times incur an additional block read Packed with unwanted

data 11/30/2007

16

Change Segment Structure v2

As buckets are flushed, they are written sequentially to the change segmentone page at a time

17

MDB-L

No Partitions in CS Allows frequently updated

blocks to have maximum space

merge() all blocks when CS is full Potentially expensive Very infrequent

Queries are supported by pointers As blocks are staged onto

the CS, their pages are recorded for later retrieval

Prefetch 11/30/2007

18

Expectations

MB will incur more cleans than MDB or MDBL Frequent merge() operation will incur block

erasure MDB and MDBL will incur slightly higher

query times Addition of CS

MDB and MDBL will have superior I/O performance Most operations are page level Less erasures lower latency11/30/2007

19

Experimental Setup (Application)

TF-IDF Term Frequency-Inverse Document

Frequency Word importance is highest for infrequent

words Requires a counting hash table Useful in many data mining and IR

applications (document classification and search)

11/30/2007

20

Experimental Setup (DataSets)

100,000 Random Wikipedia articles 136M keywords 9.7M entries

MemeTracker (Aug 2009 dump) 402M total entries 17M unique

11/30/2007

21

Experimental Setup (Method)

1M random queries were issued during insertion phase

10 random workloads, queries need not be in the table Measure Query Performance, I/O time, and

Cleans Used three SSD configurations

One Single Level Cell (SLC) vs two Multi Level Cell (MLC) configurationsMLC is more popular. Cheaper per GB but less lifetimeSLC have lower internal error rate, and faster response

rates (See Paper for specific configurations) DiskSim and Microsoft SSD Plugin

Used for benchmarking and fine-tuning our SSD

22

Results (AVERAGE Query Time)

By varying the on memory buffer, as a percentage of the data segment, the average query time only reduces by fractions of a second. This suggest the majority of the query time is incurred by the disk.

11/30/2007

23

Results (AVERAGE Query Time)

By varying the on disk buffer, as a percentage of the data segment, the average query time decreases substantiall for MDBLThis reduction is seen in both datasets. MDB requires block reads in the CS.

11/30/2007

24

Results (AVERAGE Query Time)

Using the Wiki dataset, we compared SLC with MLCWe experience consistent performance

11/30/2007

25

Results(AVERAGE I/O)

In this experiment, we set the in memory buffer to 5% and the CS to 12.5% of the primary hash table size

Simulation time is highest for MB because of the block erasures (next slide).MDBL is faster than MDB because of the increased page level operations

26

Results(Cleans/Erasures)

Cleans are extremely low for both MDB and MDBL relative to MBThis is caused by the page level sequential operations

Queries are effected by cleans because the SSD must allocateresources to cleaning moving

11/30/2007

27

Discussion and Conclusion

Flash Devices are gaining popularity Low Latency, High Random Read Performance, Low

Energy Limited lifetime, poor random write performance

Hash tables are useful data structures in many data mining and IR algorithms They exhibit random write patterns

Challenging for Flash Devices We have demonstrated that a proper Hash table

for Flash Devices will have In-memory buffer for batch memorydisk updates On disk data buffer with page level operations

11/30/2007

28

Future work

Our current designs rely on hash functions that use the mod operator Extendible Hashing

Checkpoint methods for crash recovery Examine on Real SSD

Disksim is great for finetuning and examining statistics

11/30/2007

Questions?