Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited...

33
Deciphering 2phase commit Ashwin Agrawal Asim Praveen {aagrawal, apraveen}@pivotal.io

Transcript of Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited...

Page 1: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Deciphering 2phase commitAshwin AgrawalAsim Praveen

{aagrawal, apraveen}@pivotal.io

Page 2: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Scale out

● Single instance is limited● Manual attempts at sharding PostgreSQL● FDW based sharding● MPP → distributed databases

Page 3: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Challenge with atomicity

begin;

insert into account values (id = 1 ...);

insert into account values (id = 2 ...);

commit;

shard 1

shard 2

Page 4: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Challenge with atomicity

begin;

insert into account values (id = 1 ...);

insert into account values (id = 2 ...);

commit; shard 1

shard 2

Page 5: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Challenge with atomicity

begin;

insert into account values (id = 1 ...);

insert into account values (id = 2 ...);

commit; shard 1

shard 2

Page 6: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Two phase commit

DTM

shard 2

shard 1

prepare

prepare

Phase 1

Page 7: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Two phase commit

DTM

shard 2

shard 1

prepare

prepare

Phase 1

vote: yes

vote: yes

yes

yes

Page 8: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Two phase commit

DTM

shard 2

shard 1

prepare

prepare

Phase 1

all prepared

vote: yes

vote: yes

yes

yes

Page 9: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Two phase commit

DTM

shard 2

shard 1

prepare

vote: yes

preparevote: yes

DTM

shard 2

shard 1

commit

ack

commitack

Phase 1 Phase 2

yes

yes

commit

commit

all prepared done

Page 10: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: shard failure in phase one

DTM

shard 2

shard 1

prepare

vote: yes

prepare

Phase 1

yes

rollback

Page 11: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: shard failure in phase one

DTM

shard 2

shard 1

prepare

vote: yes

prepare

DTM

shard 2

shard 1

abortack

abort

Phase 1 Phase 2

yes abort

rollback done

Page 12: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: shard failure in phase 2

DTM

shard 2

shard 1

prepare

vote: yes

preparevote: yes

DTM

shard 2

shard 1

commit

ack

commit

Phase 1 Phase 2

yes

yes

commit

all prepared

Page 13: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: recovery of shard 2

shard 2checkpoint

prepared: xid

wait for commit/abort message

DTM

Ack not received from shard 2;

retry sending commit to shard 2

timeline

Page 14: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: DTM crashed in phase 1

DTM

shard 2

shard 1

prepare

prepare

Phase 1

vote

vote

Page 15: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: DTM crashed in phase 1

DTM

shard 2

shard 1

prepare

prepare

DTM

shard 2

shard 1

in doubt xacts

(xid, state)

in doubt xacts(xid, state)

Phase 1 DTM Recovery

vote

vote

Page 16: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: DTM crashed after phase 1

DTM

shard 2

shard 1

prepare

prepare

Phase 1

all prepared

vote: yes

vote: yes

yes

yes

Page 17: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC: DTM crashed after phase 1

DTM

shard 2

shard 1

prepare

prepare

DTM

shard 2

shard 1

commit

ack

commitack

Phase 1

commit

commit

all prepared

DTM Recovery

done

vote: yes

vote: yes

yes

yes

Page 18: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

2PC vs 1PC● Prepare phase

○ 1 network round trip○ 1 disk flush

● Commit phase○ 1 network round trip○ 1 disk flush

● 2PC guarantees A and D of ACID

Page 19: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Single node snapshot isolationTuple headers contain:

• xmin: transaction ID of inserting transaction

• xmax: transaction ID of replacing/deleting transaction (initially NULL)

Basic idea: tuple is visible if xmin is valid and xmax is not. "Valid" means "either committed or the current transaction".

Page 20: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

“Snapshot” filter away active transactionsRules ensuring no transaction committing after the current transaction’s start be considered committed:

● Currently running transactions IDs never considered valid, even if shown committed in pg_clog.

● Transaction ID higher than the current transaction is not valid (future transaction).

Page 21: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Challenge with isolation

shard1A: 10B: 15

A: begin;B: begin;

A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);

shard2

Page 22: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Challenge with isolation

shard1A: 10B: 15

A: begin;B: begin;

A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);

B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);

B: commit;

shard2B: 20A: 25

Page 23: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Challenge with isolation

shard1A: 10B: 15

A: begin;B: begin;

A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);

B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);

B: commit;A: select * from acc; 1, 2, 4

B is in future for A

B visible to A !!!shard2

B: 20A: 25

Page 24: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

● Global xid and global snapshot provided by DTM

● Gxmin, Gxmax, gInProgress [ ]● tuples contain local xmin/xmax

if (!XidInSnapshot(GS, xid)){ XidInSnapshot(LS, xid)}

Global snapshot

T1G

T1L

T2G

T2L T3L

global snapshot

local snapshot

shard

Page 25: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Global snapshot in action1

A

10

2

B

15shard1

shard2

A: begin; GXID: 1B: begin; GXID: 2

A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);

Page 26: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Global snapshot in action1

A

10

2

B

15shard1

1

A

25

2

B

20shard2

A: begin; GXID: 1B: begin; GXID: 2

A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);

B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);

B: commit;

Page 27: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Global snapshot in action1

A

10

2

B

15shard1

1

A

25

2

B

20shard2

A: begin; GXID: 1B: begin; GXID: 2

A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);

B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);

B: commit;A: select * from acc; 1, 2

Page 28: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Global snapshot with local transaction

A: begin; GXID: 1B: begin; GXID: 2

A: insert into acc values(id=1, ...);B: insert into acc values(id=3, ...);

B: insert into acc values(id=4, ...);A: insert into acc values(id=2, ...);L: select * from acc;

/* 0 rows */B: commit;L: select * from acc; 4

1

A

25

2

B

20 L

1

A

10

2

B

15shard1

shard2

Page 29: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Implementation options● Model

○ Pull■ DTM as a service■ participants join a transaction■ transaction can be initiated by any participant

○ Push■ DTM initiates transaction and decides participants■ MPP databases

Page 30: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

Implementation options● Transactions

○ Global and local■ global and local transaction IDs, snapshots■ mapping between global and local transaction IDs

○ Global only■ only one xid and snapshot across the cluster

Page 31: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

ACID distributed system● Two phase commit ⇒ A and D● Global snapshot ⇒ I

?? Spot the Problem ??

A: begin;A: update acc set ... where id = 1B: begin;B: update acc set ... where id = 2B: update acc set ... where id = 1A: update acc set ... where id = 2

Page 32: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases

ACID distributed system● Two phase commit ⇒ A and D● Global snapshot ⇒ I

● Global lock manager ⇒ C

?? Spot the Problem ??

A: begin;A: update acc set ... where id = 1B: begin;B: update acc set ... where id = 2B: update acc set ... where id = 1A: update acc set ... where id = 2

Page 33: Asim Praveen Ashwin Agrawal - PostgreSQL · 2016. 2. 8. · Scale out Single instance is limited Manual attempts at sharding PostgreSQL FDW based sharding MPP → distributed databases