Consistent Join Queries for Cloud Data Stores

46
Consistent Join Queries for Cloud Data Stores Zhou Wei 1,2 Guillaume Pierre 1 , Chi-Hung Chi 2 1 VU University Amsterdam 2 Tsinghua University Beijing

description

Consistent Join Queries for Cloud Data Stores. Zhou Wei 1,2 Guillaume Pierre 1 , Chi-Hung Chi 2 1 VU University Amsterdam 2 Tsinghua University Beijing. Background. Cloud Computing Platforms for Web Apps Unlimited resources Scalable NoSQL data stores Pay-as-you-go. - PowerPoint PPT Presentation

Transcript of Consistent Join Queries for Cloud Data Stores

Page 1: Consistent Join Queries for  Cloud Data Stores

Consistent Join Queries for Cloud Data Stores

Zhou Wei 1,2

Guillaume Pierre1, Chi-Hung Chi2

1 VU University Amsterdam2 Tsinghua University Beijing

Page 2: Consistent Join Queries for  Cloud Data Stores

Background

• Cloud Computing Platforms for Web Apps– Unlimited resources– Scalable NoSQL data stores– Pay-as-you-go

Page 3: Consistent Join Queries for  Cloud Data Stores

Problems of Cloud data stores

• Problem 1: No Join queries– Only support queries within a single table– No queries across tables

• Problem 2: Relaxed data consistency– Single-row transaction only– No consistency guarantee across multi-items

• Web apps need consistent join queries

Page 4: Consistent Join Queries for  Cloud Data Stores

Why Web App needs join?

• Data Normalization Foreign-key Equi-joins

PK Title Comment1 IPhone Cool2 IPhone Nice3 IPhone Expensive

PostPK Title1 IPhone

PK Post_id Comment1 1 Cool2 1 Nice3 1 Expensive

Post

Comment

Page 5: Consistent Join Queries for  Cloud Data Stores

Why consistent?

• Application correctness is not an option• Hard to maintain correctness with an

inconsistent data store– Require skillful programmers– Error-Prone– Cost more time

Page 6: Consistent Join Queries for  Cloud Data Stores

Alternatives

• SQL databases– Centralized– Full Replication– NOT scalable to update

workload– Complicated queries– ACID Transaction

• NoSQL data stores– Distributed– Partitioned– Scalable to update

workload– No Join Queries– Single-row Transaction

Google BigtableAmazon SimpleDBCassandra …

MySQLOracle …

Page 7: Consistent Join Queries for  Cloud Data Stores

Our Position

• Scalable NoSQL data stores supporting equi-joins queries for Web applications

• Join query as a ACID transaction• Queries of Web applications access only a

small portion of data– Also true for join queries– All of them are foreign-key equi-joins

Page 8: Consistent Join Queries for  Cloud Data Stores

8

How it works?

Web Application

Cloud Data Store

Load Data

TM

Checkpoint

Transactions

TM TM TM

CloudTPSJoin Queries

Page 9: Consistent Join Queries for  Cloud Data Stores

Outline

• Join Algorithm• Transaction Protocol• Performance Evaluation• Conclusion

Page 10: Consistent Join Queries for  Cloud Data Stores

Join Algorithm

• Web applications are response-time sensitive– Table scan is not an option

• The basic idea– Linking records by FK relationships in advance– Each join query has well identified “root records”– Begin from accessing each “root record”– Recursively obtaining its related records

Page 11: Consistent Join Queries for  Cloud Data Stores

Link records for forward join

PK FK1 22 33 2

Comment

PK123

Post

Given referring row identify referenced row(FK) (PK)

Page 12: Consistent Join Queries for  Cloud Data Stores

Comment

(1,3)2

Link records for backward join

PK FK1 22 33 2

PK123

Comment Post Index

Give referenced row identify referring rows(PK) (FK)

Page 13: Consistent Join Queries for  Cloud Data Stores

Joining two tables

SELECT * FROM post, comment WHERE post.id = 1 and comment.post_id = post.id

post

comment

id=1

post.id = comment.post_id

Page 14: Consistent Join Queries for  Cloud Data Stores

Multi-Joins

• The table of the “root record” is the root

• Recursively join child tables

• Length of critical execution path = 2

Post

Comment

Id=1

User

Revision

Page 15: Consistent Join Queries for  Cloud Data Stores

Outline

• Join Algorithm• Transaction Protocol• Performance Evaluation• Conclusion

Page 16: Consistent Join Queries for  Cloud Data Stores

Distributed Transaction

TM

RW-TC

Data1

Transaction

•SubT(1)•SubT(2)•SubT(3) TM

RW-TC

Data2

TM

RW-TC

Data3

SubT(1) SubT(2) SubT(3)

2 Phase Commit (2PC)

Page 17: Consistent Join Queries for  Cloud Data Stores

Problem

• 2PC requires all records identified in advance• Join queries/index management identify

accessed records on the fly• Solution: extend 2PC to allow adding records

dynamically

Page 18: Consistent Join Queries for  Cloud Data Stores

The Original 2PC

TC

Data1 Data2

TC

Data1 Data2

TC

Data1 Data2

2. Vote 3. Commit/Abort1. Submit

Page 19: Consistent Join Queries for  Cloud Data Stores

Extended 2PC

• Dynamically add data items to the transaction

TC

Data1 Data2

TC

Data1 Data2

2. Vote1. Submit

Commit Add Data3

Page 20: Consistent Join Queries for  Cloud Data Stores

Extended 2PC

TC

Data1 Data2

3. Derive

Data3

Page 21: Consistent Join Queries for  Cloud Data Stores

Extended 2PC

TC

Data1 Data2

4. Vote

Data3

Commit

Page 22: Consistent Join Queries for  Cloud Data Stores

Extended 2PC

TC

Data1 Data2

5. Commit

Data3

Page 23: Consistent Join Queries for  Cloud Data Stores

Outline

• Join Algorithm• Transaction Protocol• Performance Evaluation• Conclusion

Page 24: Consistent Join Queries for  Cloud Data Stores

Experiment Setup

• Implement a prototype of CloudTPS – Java Servlet– Deployed in Tomcat v6.0

• Environments– 1. Das3+Hbase+Tomcat (99% reqs < 100ms)– 2. EC2+SimpleDB+Tomcat (90% reqs < 100ms)• High-CPU medium instances

Page 25: Consistent Join Queries for  Cloud Data Stores

Micro-benchmarks

• Generate specific workload– Containing only Join queries/RW transactions– Changing the number of accessed data items– Changing the length of critical execution path– Switch between Update indexes/Not update

• All configured with 10 CloudTPS nodes• Measuring maximum sustainable throughput– Under response time constraint

Page 26: Consistent Join Queries for  Cloud Data Stores

Two-table join evaluation

Transaction Per Second Record Per Second

Increasing number of accessed records in a transaction

Page 27: Consistent Join Queries for  Cloud Data Stores

Multi-Joins Evaluation

Increasing the height of join-query treeAll join queries access 12 records

Page 28: Consistent Join Queries for  Cloud Data Stores

RW transactions evaluation

Transaction Per Second Record Per Second

Increasing number of accessed records in a transaction

Page 29: Consistent Join Queries for  Cloud Data Stores

Scalability Evaluation

• Macro-Benchmark– Derived from TPC-W, simulating an online book

store– Our workload includes only the join queries and

RW transactions– No simple queries

Page 30: Consistent Join Queries for  Cloud Data Stores

Scalability Results

Linearly Scalability: Any increase of workload can be addressed by adding more machines

Page 31: Consistent Join Queries for  Cloud Data Stores

CloudTPS vs. Postgresql

Postgresql: binary-replication

Page 32: Consistent Join Queries for  Cloud Data Stores

Fault Tolerance

• Recovers from 3 network partitions and 2 machine failures

Page 33: Consistent Join Queries for  Cloud Data Stores

Conclusion

• We demonstrate supporting consistent join queries over partitioned and distributed data

• Join queries from Web applications only access a small portion of data

• It scales linearly even under an intensive workload of only join queries and read-write transactions

Page 34: Consistent Join Queries for  Cloud Data Stores

Questions?

Thank you!

Page 35: Consistent Join Queries for  Cloud Data Stores

Details in Macro-benchmark workload

Join queries RW transactions

Page 36: Consistent Join Queries for  Cloud Data Stores

PK SK1 A2 B3 B

Secondary-key Queries

• Select * from Table1 Where SK = ‘B’• Create a separate index table for the SK• Translate into a join query

Table 1Index Table

PK’(SK) T1A 1B 2,3

Page 37: Consistent Join Queries for  Cloud Data Stores

Index Management• Indexes link the related records

PK Title1 Hello2 Good3 Well

PK FK1 12 13 2

post comment

T11,23-

Index

Page 38: Consistent Join Queries for  Cloud Data Stores

Index Management (cont.)

• Case: a new comment inserted• Updating indexes atomically

PK Title1 Hello2 Good3 Well

PK FK1 12 13 24 2

post comment

T11,23, 4

-

Index

Page 39: Consistent Join Queries for  Cloud Data Stores

Index Management (cont.)

• Consistency Violated Case 1

PK Title1 Hello2 Good3 Well

PK FK1 12 13 24 2

post comment

T11,23-

Index

?

Page 40: Consistent Join Queries for  Cloud Data Stores

Index Management (cont.)

• Consistency Violated Case 2

PK Title1 Hello2 Good3 OK

PK FK1 12 13 2

post comment

T11,23, 4

-

Index

?

Page 41: Consistent Join Queries for  Cloud Data Stores

Fault tolerance

• Maintain ACID during server failure– Transactions that reach an agreement of “COMMIT”

should Commit– Abort other transactions

• Replication to N transaction managers– Values of Data Items– States of Transactions (Vote result and “redo logs”)

• Failover of a failed server– Transfer ownership of data items– Transfer leadership of transactions being coordinated

Page 42: Consistent Join Queries for  Cloud Data Stores

Optimization for Join Queries

• Join Query is a Read-only transaction– Remove the final Commit phase– Sub-transactions removed immediately after local

execution– RO transactions will not block other transactions

Page 43: Consistent Join Queries for  Cloud Data Stores

43

Concurrency Control

Data1

SubT(1)

SubT(1)5

6

Timestamp Ordering

Data2

SubT(2)

SubT(2)5

6

Page 44: Consistent Join Queries for  Cloud Data Stores

Read-Only Transaction

Data1

ROSubT(1)

SubT(1)5

6

Data2

SubT(2)5

TM

RW-TC

Data3

RO-TC

Join(Data1) TM TM

SubT(1)7

Page 45: Consistent Join Queries for  Cloud Data Stores

Read-Only Transaction (Cont.)

Data1

ROSubT(1)6

Data2

TM

RW-TC

Data3

RO-TC

Join(Data1) TM TM

Data2

SubT(1)7

Page 46: Consistent Join Queries for  Cloud Data Stores

Read-only Transaction (cont.)

Data2

TM

RW-TC

Data3

RO-TC

TM

NoneROSubT(2)6

Result

Data1

SubT(1)7