RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

32
RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1

Transcript of RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

Page 1: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

1

RAMCloud Design Review

Indexing

Ryan Stutsman

April 1, 2010

Page 2: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

2

Introduction

• Should RAMCloud provide indexing?o Leave indexes to client-side using transactions?

• Many apps have similar indexing needso Hash indexes, B+Trees, etc.o Can reduce app visible latency for indexes by optimizing

server-side

Page 3: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

3

Implementation Issues

• Indexing on “opaque” data• Splitting Indexes• Consistency• Recovery/Availability of Indexes

Page 4: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

4

Explicit Search Keys

• Problem: RAMCloud treats objects as opaqueo Server-side indexing without understanding the data?

Max Power (650) 555-5555

put(tableId, person.objectId, person.pickle())

Page 5: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

5

Explicit Search Keys

• Problem: RAMCloud treats objects as opaqueo Server-side indexing without understanding the data?

• Idea: Apps provide search keys explicitlyo Apps understand the data

put(tableId, person.objectId, {‘first’: person.first, ‘last’: person.last}, person.pickle())

Powerlast field IDfirst field ID Max Max Power (650) 555-5555

Page 6: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

6

Explicit Search Keys

• Problem: RAMCloud treats objects as opaqueo Server-side indexing without understanding the data?

• Idea: Apps provide search keys explicitlyo Apps understand the data

• Can eliminate redundancyo Search keys need not be repeated in objecto Search keys + Blob are returned to app on get/lookup

put(tableId, person.objectId, {‘first’: person.first, ‘last’: person.last}, person.pickle())

Powerlast field IDfirst field ID Max (650) 555-5555

Page 7: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

7

Explicit Search Keys

• Put atomically updates indexes and objecto Details to follow

put(tableId, objectId, searchKeys, blob)

get(tableId, objectId) –> (searchKeys, blob)

lookup(tableId, indexName, searchValue) -> (searchKeys, blob)

Page 8: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

8

Splitting Indexes

• Co-locate index and data

• Large tables?• Large indexes?

o Can’t avoid multi-machine operations

IndexA-Z

Data0-99

Master CMaster B

Data0-299

Master A

IndexA-Z

Master A

Page 9: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

9

Splitting Indexes

• Split indexes on search key

o One extra access per lookup and put

• Split indexes on object ID

o Lookups go to all index fragmentso Puts are always local

• Our decision (for now): On search keyo Don’t want weakest-link lookup performance

Index200-299

Data200-299

Index100-199

Data100-199

Index0-99

Data0-99

Data100-299

IndexA-R

IndexS-Z

Data0-99

Page 10: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

10

Consistency

• Problem: Index/Object inconsistency on putso Object and index may reside on different hostso Apps can get objects that aren’t in the index yeto Apps may see index entries for objects not in table yet

• Avoid commit protocol• Idea: Index entries “commit” on object put

o Write index entrieso Then write object to tableo Index entries considered invalid until object written

• Turns atomic puts into atomic index updates

Page 11: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

11

Consistency

Powell 300

Powers 299

Mary 299

Mel 300

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 12: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

12

Consistency: Lookup

Powell 300

Powers 299

lookup(0, ‘last’, ‘Power’)

Mary 299

Mel 300

• Request goes directly to correct indexo “Not found” returns immediately

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 13: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

13

Consistency: Lookup

Powell 300

Powers 299

lookup(0, ‘last’, ‘Powell’)

Mary 299

Mel 300

‘Powell’ == ‘Powell’ ok

• Consistency is checked on hito If table and index agree the return the objecto Else “not found”

300lastName Index

firstName Index

300

Mary Powers Mel Powell

Data Table

299 300

Page 14: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

14

Consistency: Create

Powell 300

Powers 299

put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())

Mary 299

Mel 300

• Insert index entries before writing object

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 15: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

15

Consistency: Create

Powell 300

Power 301

Powers 299

put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())

Mary 299

Mel 300

• Insert index entries before writing objecto What if a lookup happens in the meantime?

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 16: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

16

Consistency: Concurrent Lookup

Powell 300

Power 301

Powers 299

put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())

Mary 299

Mel 300

lookup(0, ‘last’, ‘Power’)

• Concurrent ops ignore inconsistent entries

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 17: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

17

Mary Powers Mel Powell

Data Table

299 300

Consistency: Concurrent Lookup

Powell 300

Power 301

Powers 299

put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())

Mary 299

Mel 300

lookup(0, ‘last’, ‘Power’)

Not Found

• Concurrent ops ignore inconsistent entries

301

lastName Index

firstName Index

301

Page 18: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

18

Consistency: Create (continued)

Powell 300

Power 301

Powers 299

put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())

Mary 299

Max 301

Mel 300

• Insert index entries before writing object

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 19: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

19

Mary Powers Mel Powell

Data Table

299 300

Consistency: Create

Powell 300

Power 301

Powers 299

put(0, 301, {‘first’: ‘Max’, ‘last’: ‘Power’}, person.pickle())

Mary 299

Max 301

Mel 300

Max Power

• Put completes; index entries now valid

lastName Index

firstName Index

301

Page 20: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

20

Consistency: Delete

Powell 300

Power 301

Powers 299

delete(0, 301)

Mary 299

Max 301

Mel 300

Max Power

• Delete object first, then cleanup index entrieso Index entries are invalid with no corresponding object

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Max Power

301

Page 21: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

21

Consistency: Delete

Powell 300

Power 301

Powers 299

delete(0, 301)

Mary 299

Max 301

Mel 300

• Delete object first, then cleanup index entrieso Index entries are invalid with no corresponding object

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 22: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

22

Mary Powers Mel Powell

Data Table

299 300

Consistency: Delete

Powell 300

Powers 299

delete(0, 301)

Mary 299

Mel 300

• Delete object first, then cleanup index entrieso Index entries are invalid with no corresponding object

lastName Index

firstName Index

Page 23: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

23

Consistency: Update

Powell 300

Powers 299

put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())

Mary 299

Mel 300

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 24: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

24

Consistency: Update

Miller 299

Powell 300

Powers 299

put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())

Mary 299

Mel 300

• Compare previous index entrieso Insert new value if updated

lastName Index

firstName Index

Mary Powers Mel Powell

Data Table

299 300

Page 25: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

25

Consistency: Update

Miller 299

Powell 300

Powers 299

put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())

Mary 299

Mel 300

• Commit by writing the new valueo Old index entries ignored by lookup since inconsistent

lastName Index

firstName Index

Mary Miller Mel Powell

Data Table

299 300

Page 26: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

26

Consistency: Update

Miller 299

Powell 300

put(0, 299, {‘first’: ‘Mary’, ‘last’: ‘Miller’}, person.pickle())

Mary 299

Mel 300

• Cleanup old, inconsistent entries

lastName Index

firstName Index

Mary Miller Mel Powell

Data Table

299 300

Page 27: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

27

Consistency: Thoughts

• Atomic puts give index updates atomicity• Low-latency gives simplified consistency

o Can afford to have a single writer per objecto Provides us with atomic put primitive for free

Page 28: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

28

Index Recovery

• Problem: Unavailable until indexes recovero Many requests will be lookupso These will block until indexes are recovered

• Rebuild versus Store?o Storing comes at a cost to write-bandwidtho Possible using scale we can rebuild faster than store

Page 29: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

29

Index Recovery: Partitioning

•How far does partitioning + rebuilding get us?• Worst case: Entire partition of index data only

o At most 640 MBo Larger indexes recovered a partition to a host in parallel

Page 30: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

30

Index Recovery: Partitioning

Recover a single index partition on a new master:

1. Data partitions scan, extract index entries (0.6s)o Hashtable: 10 million lookups/seco 640 MB / 100 byte/object = 6.4 million objects

2. Transmit entries to new index partition (0.6s)o At most 640 MB @ 10 Gbit/s

3. New index master reinsert entries (0.6s) Similar time to master hashtable scan

• All operations are pipelinedo 0.6s to scan, extract, transmit, rebuild total

• If data partitions for index in recovery add 0.6so 1.2s upper bound for conservative 100b object size

Page 31: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

31

Summary

• Explicit search keys both flexible and efficient• Split indexes on search key for fast lookup• Atomic puts simplify atomic indexes• Scale drives index recovery for availability

Page 32: RAMCloud Design Review Indexing Ryan Stutsman April 1, 2010 1.

32

Discussion