Life Beyond Distributed Transactions: An Apostate’s Opinion Pat Helland Partner Architect...

45
Life Beyond Distributed Transactions: An Apostate’s Opinion Pat Helland Partner Architect Microsoft Corporation Apostate: noun One who renounces a previously held be

Transcript of Life Beyond Distributed Transactions: An Apostate’s Opinion Pat Helland Partner Architect...

Life Beyond Distributed Transactions: An Apostate’s OpinionPat HellandPartner ArchitectMicrosoft Corporation

Apostate: noun “One who renounces a previously held belief.”

Slide 2

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Slide 3

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Session Objectives And Takeaways• Distributed transactions aren’t used much in practice

• They are fragile and impede availability• Local transactions are wonderful!

• Designing for scalability requires planning• Need to think about separate items as the scope of transactions• Even when separate items are on the same machine (today), you

must plan for them to be repartitioned later• Interacting across items requires messaging

• Managing the messaging is complex• Each partner must track the state of its interaction with partner

items• Scalable applications become state-driven workflow

• Surprise: the fine granularity of the participants in the workflow

Today’s Goal:Offer hopefully insightful opinions about scaleable apps.

incite-ful

Slide 4

Pointer to Paper

• Paper delivered at CIDR-2007• Conference on Innovation Database Research• http://www-db.cs.wisc.edu/cidr/cidr2007/papers/

cidr07p15.pdf

• Terminology changes from Paper:

• Entity Item• Activity Partner-State-Machine

Slide 5

Slide 6

Want Almost-Infinite Scaling• More of everything… Year by year, bigger and bigger• If it fits on your machines, multiply by 10, if that fits, multiply by

1000…• Strive to scale almost linearly (N log N for some big log).

Assumptions(Don’t Have to Prove These… Just Plain Believe Them)

Grown-Ups Don’t Use Distributed Transactions• The apps using distributed transactions become too fragile…• Let’s just consider local transactions. Multiple disjoint scopes of serializability

Want Scale-Agnostic Apps• Two layers to the

application: scale-agnostic and scale-aware

• Consider scale-agnostic API

Scale Agnostic Code

Scale-Aware-Code

Application

Upper Layer

Lower Layer

Scale Agnostic API

Data

Transaction Data

Transaction Data

Disjoint Scopes of Serializability• Assume transactions only within a single machine

• OK, I’ll give you a small cluster…but not all the machines!• Repartitioning moves data

• To expand the app, some data moves to a new data-store• Which data can you count on for a transaction?

• Remember, it might get moved…• What’s on one machine today may get moved to another

tomorrow!• Recall, no transactions may cross machines• What CAN you tie into a single transaction?

Slide 7

Slide 8

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Slide 9

Uniquely Keyed Items• Not all data may be in a single transaction

• We must collect the data into pieces• We must annotate the boundaries of the data guaranteed to

be transactional• Must remain transactional even if we repartition!

• An item:• A collection of data that fits on a single machine• Identified by a unique key

• Assume the scale-aware-code never partitions an item• The unique key defines the data that can’t be partitioned

The application’s data is factored into items,

eachof which has a unique

key Each item will reside on

a single machine(ignoring replication &

H/A)

ItemKey = “ABC”

ItemKey = “WPB”

ItemKey = “QLA”

ItemKey = “UNB”

Slide 10

Transactions and Items• A transaction may update a single item

• The scale-aware-code (and API) guarantee it • The item is never partitioned

• A transaction must not ever update two items• Even if the two live on one machine today• Tomorrow, they may repartition to different machines…

Item“ABC”

Item“DEF”

Transaction

Item“XYZ”

Item“RST”

Item“RAA”

Item“NAO”

Item“MOE”

Item“JKL”Item

“FXQ”Item“GHI”

Item“RST”Item“JKL”

Item“LMN”Item“JAA”

Item“ABC”

Item“ABZ”

Item“DEF”

Item“KZU”

Item“XYZ” Item

“LMN”Item

“LMN”Item“GHI”Item

“FAW”Item

“XYZ”

Repartitioning and Items• Items allow scaling

• Items remain intact even when repartitioning• The application can count on the integral nature of each

item• It is OK to know that the entire item is local• It is OK to work on anything in the item at once

No Promisethat TwoDifferent

Items Stayon the Same

Machine!!

Slide 11

Frequently the work

won’t fit on one

computer!

Slide 12

Thinking about Queries• Queries just got HARD!

• Certainly can’t do cross-item transactional queries• The items aren’t in the same scope of serializability

• Perhaps can query on stale versions of the data• Very useful… just different than classic DB

• Can do distributed queries• Send partial queries around the network• Hard as the dataset explodes in size

• Can filter copies of old versions• Keep a subset on a machine for ad-hoc queries• Subset becomes a smaller percentage as we scale…

Many Traditional Queries Are Used Today to Implement

ItemsGotta Join to Overcome the

Normalization of Rows!

Ad-Hoc Queries Get Harder…Scaling Means It Won’t All

Fit

Slide 13

Thinking about Alternate Indices• Items must have a unique key

• Unless the you begin with the same key, you aren’t the same• CANNOT guarantee the alternate index will co-locate with

the item’s primary key• By definition, alternate indices don’t have the same key!• We must index them with a different key…

• Alternate indices CANNOT be updated in the same transaction as the primary data• There is no way to guarantee they are on the same machine• They must be updated in different transactions…

Item Keys Indexed

by 1st Alternate Key

PK

:123

PK

:217

PK

:332

PK

:589

PK

:719

Item Keys Indexed

by 1st Alternate Key

A1:A

BC

A1:D

EF

A1:G

HI

A1:JK

L

A1:M

NO

Item Keys Indexed

by 2nd Alternate Key

A2:a

bc

A2:d

ef

A2:f

gh

A2:g

hu

A2:k

lw

Slide 14

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Slide 15

Items Are Connected by Messaging• Items are key-named boundaries for transactional work

• Transactions never span items• The scale-aware-code may move them to repartition

• The only way to communicate across items is with messaging!• The scale-aware-code is responsible for finding the correct item

(by key-name) and for routing the message to it

Boundary ofTransactions

Boundary ofTransactions

Item-X Item-YSend To:Item-Y

“Messaging” Is in Quotes… Work Is Invoked -- Potentially across Machines -- Definitely across Transactions!

Slide 16

Keeping Notes Before You Speak• Transactions update the data within an item

• They also update the intent to send a message• Must not send a message unless the intent commits

• Otherwise, the message could arrive and the intent to send the message aborts with the sending transaction

• Output queues are frequently transactional• Otherwise even more confusing things can happen

TransactionItemItem

PrivateData

App Logic

Slide 17

At-Least-Once Delivery Semantics

• Each message is sent at-least-once• Given infinite time…

• The sender tries and tries and tries until acked• Eventually, the message is delivered

Dialogs and Exactly-Once Delivery• It is Possible to Implement Exactly-Once Delivery Within a

Relationship• Dialog:

• Similar to TCP-IP but Long-Running• Can Guarantee Exactly-Once Delivery OR Failure-Notification• Requires Interesting Platform Support

• Not the Topic of this Talk• See Microsoft SQL Server 2005 – SQL Service Broker

Slide 18

Idempotence: It’s Not a Medical Condition• Requests get lost…

• Gotta retry them to handle lost requests• Requests arrive more than once…

• Those pesky retries may actually arrive• Idempotent means it’s OK to arrive multiple times

• As long as the request is processed at least once, the correct stuff occurs

• In today’s world, you must design your requests to be idempotent

Not idempotent

Baking a cakestarting fromingredients

Naturally idempotent

Sweeping the floor

Naturally idempotent

Read record “X”

IdempotentIf haven’t yet

doneWithdrawal

#XYZfor $1 billion,then withdraw$1 billion andlabel as #XYZ

Not idempotentWithdrawing

$1 billion

IdempotentBaking a

cakeStarting fromthe shoppinglist (if money

doesn’t matter)

Slide 19

Out of Order Arrival• Any message may arrive multiple times

• Even after a long while• This can be very confusing…

• Lots of possible message deliveries

Applications find itdifficult to ensure

there are no latent bugs

------------------Esoteric late retriesof messages may

finduntested windows…

A C A A

Arrg!

Item ItemA B C B

Slide 20

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Messages Connect Items• Messages are the only way into and out of

items• They are produced by transactions• They are consumed by transactions• Transactions are local to the item

Slide 21

Send To:Item-A

Send To:Item-B

Send To:Item-C

From:Item-B

From:Item-C

From:Item-D

From:Item-A

Item-X

Slide 21

Slide 22

Item-YItem-XItem-W

Item-Z

Items Connected by Partnerships• Mostly, messaging occurs between two partner

items• Usually, a two-way exchange moving both items’ state• Each keeps data about how far its state has advanced…

Slide 23

Tracking with Partner-State-Machines• Partner-state-machine refers to the knowledge about a

partner item• Descriptions of what messages have been received• Descriptions of what obligations exist to the partner• The foundation for workflow to replace distributed transactions

• Two basic observations wrapped up in the “partner-state-machine”• Work across items is workflow based on two-party relationships• The granularity of the workflow participant is an item (fine-

grained)

Item-Z

Item-X Item-YItem-WPSM-X PSM-W

PSM-Z

PSM-X

PSM-X PSM-W

Slide 24

Idempotence, Partners, and Partner-State-Machines• Partner-state-machines manage

idempotence• They keep track of what’s been seen• If it’s a repeat, ignore it

• Repeated messages eliminated via partnership Item-Y

PSM-XSeen Msg-A, B, C…

Item-X

PSM-YSeen Msg-1, 2, 3…

2 31

BC A

Slide 25

Retirement of Items• It is normal for items to retire

• The shipment is shipped• The order completes

• Activities advance to completion• Incoming messages are accepted• No new messages are needed• Typical for the work of an item to complete…

• Retirement usually means “become read-only”• Sometimes old items are deleted

Sometimes Items Exist for Long-Lived Purposes:-- Inventory, Bank-Balance, Customer-- Called “Resource-Items”

Not the topic for this talk… another talk is needed!

Slide 26

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

“Append-Only” Data

• Many Kinds of Computing are “Append-Only”• Lots of observations are made about the world

• Debits, credits, Purchase-Orders, Customer-Change-Requests, etc

• As time moves on, more observations are added• You can’t change the history but you can add new

observations• Derived Results May Be Calculated

• Estimate of the “current” inventory• Frequently inaccurate

• Historic Rollups Are Calculated• Monthly bank statements

Slide 27

Databases and Transaction Logs• Transaction Logs Are the Truth

• High-performance & write-only• Describe ALL the changes

to the data• Data-Base the Current Opinion

• Describes the latest value of the data as perceived by the application

Log

DBThe Database Is a Caching

of the Transaction Log !It is the subset of the latest committed values represented in the transaction

log…

Slide 28

Accountants, Erasers, and Jail

• Accountants Go to Jail if They Use Erasers !!!• The normal accounting practices allow for corrections

but not updates• Corrections are added to the information• The derived values are recalculated

• It is a common application paradigm to keep almost all data as append-only• The transactions themselves are append-only

• Sometimes they are eventually retired.• The rollup (derived) summary may be recalculated• Periodic snapshots of the rollup (derived) data is

appended to the record• E.g. a monthly bank statement

Slide 29

Slide 30

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Versions and Distributed Systems• Can’t have

“the same” dataat many locations• Unless it is

a snapshot• Changing

distributed dataneeds versions• Creates a

snapshot…

Slide 31

ListeningPartnerService-1

ListeningPartnerService-5

ListeningPartnerService-7

ListeningPartnerService-8

Tuesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-List

Tuesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-List

Tuesday’sPrice-List

Data Owning Service

Price-List

ListeningPartnerService-1

ListeningPartnerService-5

ListeningPartnerService-7

ListeningPartnerService-8

Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-ListMonday’sPrice-ListMonday’sPrice-List

Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-ListMonday’sPrice-ListMonday’sPrice-List

Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List

Data Owning Service

Price-List

Data Owning Service

Price-List

DAGs of History

Data“A1”

Data“A1.1”

Data“C1”

Data“B1”

Data“D1.1”

Data“D1”

Data“C2”

Data“B2”

Data“C2.1”

Data“D1.2”

Data“D2”

Data“C3”

Data“A2”

Data“D2.1”

Data“B3”

Data“D3”

Slide 33

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Tentative Operations

• Items don’t share transactions• Now what can we do?

• Items may accept tentative operations• Like a reservation; may be cancelled later

• If cancelled, the receiving item must cope• Special logic to deal with cancellations

Slide 34

Item-BItem-A

Slide 35

Semantics of Tentative Operations• Tentative operations must be reorderable

• When cancelled, a compensation must occur• Other operations may have occurred since

• Operations and cancellations must be reorderable!

Item-B

Cancellation TentativeOp

TentativeOp

1

2

3Item-A

Item-C

Slide 36

Semantics of Cancellation and Confirmation• Cancellation

• Cope with not doing tentative operation• Not undo

• New operation to “make things right”• Accepting tentative means it’s OK to cancel

• Confirmation• Relinquish the right to cancel tentative op• Sometimes time driven

• Hotel rooms confirm in the morning

• Every tentative op confirms or cancels

Slide 37

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Increasing & Decreasing Uncertainty

• Each tentative operation increases your uncertainty• You get more and more confused each time you accept a

tentative operation• Each confirmation or cancellation decreases your

uncertainty• It resolves the confusion imparted by the tentative

operation it is confirming or canceling

UncertaintyMore

UncertainLess

Uncertain

TentativeOperation

Cancellationor Confirmation

Slide 38

Bounded Uncertainty• You can track the worst case situations for data

values you are managing• If you keep inventory, you can know the lowest possible

and highest possible values• Tentative operations move lowest and highest values apart

• This increases uncertainty• Confirmations and cancellations move lowest and highest

values together• This decreases uncertainty

• Knowing the bounds, you have Bounded Uncertainty

Widget Inventory

Pro

babili

ty

MinimumWidgetsPossible

MaximumWidgetsPossible

Slide 39

Acting on Bounded Uncertainty

• Knowing bounds on uncertainty allows many different business rules:• Refuse an order which may (in the worst case) result in

widgets overflowing the warehouse• Calculate probability of worst case overflowing the

warehouse• Cost of temporary storage vs.

value of accepting order…• Order food for hotel restaurant based on reservations and

probabilities• May result in interesting work by applying risk

management algorithms…

Slide 40

Slide 41

Outline Introduction

Data, Transactions, and Scalability

Messaging across Items

Partner-State and State-Machines

Accountants Don’t Use Erasers

Accurate Representations of Historical Facts

Bounds of Uncertainty in Loosely-Coupled Systems

Conclusion

Managing Uncertainty across Items

Slide 42

Vocabulary and AssertionsN

ew

voca

bu

lary

for

dis

cuss

ing

sca

le

Ass

ert

ion

sab

ou

t la

rge

scale

ap

ps

Scale-agnostic appAn application that does not need to change to support almost-infinite scaling

Almost-infinitescaling

An environment demanding rapidly increasingdata and computation over time

Item

A collection of data referenced by a single key;transactional scope of the scale-agnostic app

Partner-State-Machine

Data used inside one item to describe its workflow state with a single partner item

Alternate indicesaren’t

transactionallyconsistent

As scale increases, the primary and alternateindices cannot be guaranteed to live togetherItems cooperate

using fine-grainedtwo-party workflow

No dist-txs workflow; workflow participants are items; work coordinated across pairs

• Scale agnostic application design• Designing for scale leads you away from distributed transactions• Local transactions are great distributed transactions suck

• Programming for scale leads to separate pieces of data called items• Items must live in separate transactions• Items are only connected with messaging• “Classic” workflow but fine-grained

• Separate items messaging… but messaging is hard!• Messages get lost and need retries• Retries give at least once delivery• Must have idempotent processing of messages

• Coping with idempotent messaging requires “partner-state-machines”• One PSM per-partner per-side holds the state of the

relationship• The scale-agnostic app uses activities to cope with retries• PSMs can compose to mask complexity

Takeaways

Slide 43Slide 43

Complete your evaluation on the My Event pages of the website at the CommNet or the Feedback Terminals to win!

All attendees who submit a session feedback form within 12 hours after the session ends will have the chance to win the very latest HTC 'Touch' smartphone complete with Windows Mobile® 6 Professional

© 2007 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.