CDS-08 Distributed Systems - cs.anu.edu.au

16
520 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 520 of 758 (chapter 8: “Distributed Systems” up to page 641) Network protocols & standards 3: Network Layer Service: Transfer of packets inside the network Functions: Routing, addressing, switching, congestion control Examples: IP, X.25 Application Presentation Session Transport Network Data link Physical Application Presentation Session Transport Network Data link Physical Network Data link Physical User data User data OSI Network Layers 517 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 517 of 758 (chapter 8: “Distributed Systems” up to page 641) Network protocols & standards Application Presentation Session Transport Network Data link Physical Application Presentation Session Transport Network Data link Physical Network Data link Physical User data User data OSI Network Layers 514 8 Distributed Systems Uwe R. Zimmer - The Australian National University Systems, Networks & Concurrency 2020 521 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 521 of 758 (chapter 8: “Distributed Systems” up to page 641) Network protocols & standards 4: Transport Layer Service: Transfer of data between hosts Functions: Connection establishment, management, termination, flow-control, multiplexing, error detection Examples: TCP, UDP, ISO TP0-TP4 Application Presentation Session Transport Network Data link Physical Application Presentation Session Transport Network Data link Physical Network Data link Physical User data User data OSI Network Layers 518 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 518 of 758 (chapter 8: “Distributed Systems” up to page 641) Network protocols & standards 1: Physical Layer Service: Transmission of a raw bit stream over a communication channel Functions: Conversion of bits into electrical or optical signals Examples: X.21, Ethernet (cable, detectors & amplifiers) Application Presentation Session Transport Network Data link Physical Application Presentation Session Transport Network Data link Physical Network Data link Physical User data User data OSI Network Layers 515 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 515 of 758 (chapter 8: “Distributed Systems” up to page 641) References for this chapter [Bacon1998] Bacon, J Concurrent Systems Addison Wesley Longman Ltd (2nd edition) 1998 [Ben2006] Ben-Ari, M Principles of Concurrent and Dis- tributed Programming second edition, Prentice-Hall 2006 [Schneider1990] Schneider, Fred Implementing fault-tolerant services using the state machine approach: a tutorial ACM Computing Surveys 1990 vol. 22 (4) pp. 299-319 [Tanenbaum2001] Tanenbaum, Andrew Distributed Systems: Prin- ciples and Paradigms Prentice Hall 2001 [Tanenbaum2003] Tanenbaum, Andrew Computer Networks Prentice Hall, 2003 522 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 522 of 758 (chapter 8: “Distributed Systems” up to page 641) Network protocols & standards 5: Session Layer Service: Coordination of the dialogue between application programs Functions: Session establishment, management, termination Examples: RPC Application Presentation Session Transport Network Data link Physical Application Presentation Session Transport Network Data link Physical Network Data link Physical User data User data OSI Network Layers 519 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 519 of 758 (chapter 8: “Distributed Systems” up to page 641) Network protocols & standards 2: Data Link Layer Service: Reliable transfer of frames over a link Functions: Synchronization, error correction, flow control Examples: HDLC (high level data link control protocol), LAP-B (link access procedure, balanced), LAP-D (link access procedure, D-channel), LLC (link level control), … Application Presentation Session Transport Network Data link Physical Application Presentation Session Transport Network Data link Physical Network Data link Physical User data User data OSI Network Layers 516 Distributed Systems © 2020 Uwe R. Zimmer, The Australian National University page 516 of 758 (chapter 8: “Distributed Systems” up to page 641) Network protocols & standards OSI network reference model Standardized as the Open Systems Interconnection (OSI) reference model by the International Standardization Organization (ISO) in 1977 • 7 layer architecture • Connection oriented Hardy implemented anywhere in full … …but its concepts and terminology are widely used, when describing existing and designing new protocols …

Transcript of CDS-08 Distributed Systems - cs.anu.edu.au

520

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

520

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

3: N

etw

ork

Lay

er

• Se

rvic

e: T

ran

sfer

of p

acke

ts in

sid

e th

e n

etw

ork

• Fu

nct

ion

s: R

ou

tin

g, a

dd

ress

ing,

sw

itch

ing,

co

nge

stio

n c

on

tro

l

• Ex

amp

les:

IP, X

.25

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

517

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

517

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

Net

wo

rk

Dat

a lin

k

Phys

ical

Use

r d

ata

Use

r d

ata

OSI

Net

wo

rk L

ayer

s

514

8D

istr

ibut

ed S

yste

ms

Uw

e R

. Zim

mer

- T

he A

ustr

alia

n N

atio

nal U

nive

rsity

Syst

ems,

Net

wo

rks

& C

on

curr

ency

202

0

521

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

521

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

4: T

rans

po

rt L

ayer

• Se

rvic

e: T

ran

sfer

of d

ata

bet

wee

n h

ost

s

• Fu

nct

ion

s: C

on

nec

tio

n e

stab

lish

men

t, m

anag

emen

t, te

rmin

atio

n, fl

ow

-co

ntr

ol,

mu

ltip

lexi

ng,

err

or

det

ecti

on

• Ex

amp

les:

TC

P, U

DP,

ISO

TP0

-TP4

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

518

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

518

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

1: P

hysi

cal L

ayer

• Se

rvic

e: T

ran

smis

sio

n o

f a r

aw b

it s

trea

m

ove

r a

com

mu

nic

atio

n c

han

nel

• Fu

nct

ion

s: C

on

vers

ion

of b

its

into

ele

ctri

cal o

r o

pti

cal s

ign

als

• Ex

amp

les:

X.2

1, E

ther

net

(cab

le, d

etec

tors

& a

mp

lifi e

rs)

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

515

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

515

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Ref

eren

ces

for

this

cha

pter

[ Bac

on1

998 ]

Bac

on

, J

Co

ncu

rren

t Sys

tem

s A

dd

iso

n W

esle

y Lo

ngm

an

Ltd

(2n

d e

dit

ion

) 199

8

[ Ben

2006

] B

en-A

ri, M

Pr

inci

ple

s o

f Co

ncu

rren

t an

d D

is-

trib

ute

d P

rogr

amm

ing

seco

nd

ed

itio

n, P

ren

tice

-Hal

l 200

6

[ Sch

neid

er19

90 ]

Sch

nei

der

, Fre

d

Imp

lem

enti

ng

fau

lt-t

ole

ran

t ser

vice

s u

sin

g th

e st

ate

mac

hin

e ap

pro

ach

: a tu

tori

al

AC

M C

om

pu

tin

g Su

rvey

s 19

90

vol.

22 ( 4

) pp

. 299

-319

[ Tan

enb

aum

2001

] Ta

nen

bau

m, A

nd

rew

D

istr

ibu

ted

Sys

tem

s: P

rin

-ci

ple

s an

d P

arad

igm

s Pr

enti

ce H

all 2

001

[ Tan

enb

aum

2003

] Ta

nen

bau

m, A

nd

rew

C

om

pu

ter N

etw

ork

s Pr

enti

ce H

all,

2003

522

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

522

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

5: S

essi

on

Laye

r

• Se

rvic

e: C

oo

rdin

atio

n o

f th

e d

ialo

gue

bet

wee

n a

pp

licat

ion

pro

gram

s

• Fu

nct

ion

s: S

essi

on

est

ablis

hm

ent,

man

agem

ent,

term

inat

ion

• Ex

amp

les:

RPC

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

519

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

519

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

2: D

ata

Link

Lay

er

• Se

rvic

e: R

elia

ble

tran

sfer

of f

ram

es o

ver

a lin

k

• Fu

nct

ion

s: S

ynch

ron

izat

ion

, err

or

corr

ecti

on

, flo

w c

on

tro

l

• Ex

amp

les:

HD

LC (h

igh

leve

l dat

a lin

k co

ntr

ol p

roto

col)

, LA

P-B

(lin

k ac

cess

pro

ced

ure

, bal

ance

d),

LAP-

D (l

ink

acce

ss p

roce

du

re, D

-ch

ann

el),

LLC

(lin

k le

vel c

on

tro

l), …

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

516

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

516

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

OSI

net

wo

rk r

efer

ence

mo

del

Stan

dar

diz

ed a

s th

eO

pen

Syst

ems

Inte

rcon

nect

ion

(OSI

) ref

eren

ce m

od

el b

y th

e In

tern

atio

nal

Sta

nd

ard

izat

ion

Org

aniz

atio

n (I

SO) i

n 1

977

• 7

laye

r ar

chit

ectu

re

• C

on

nec

tio

n o

rien

ted

Har

dy

imp

lem

ente

d a

nyw

her

e in

full

…b

ut i

ts c

once

pts

and

term

inol

ogy

are

wid

ely

use

d,

wh

en d

escr

ibin

g ex

isti

ng

and

des

ign

ing

new

pro

toco

ls …

529

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

529

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

(SP

I)R

ecei

ve s

hift

regi

ster

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

NSS

CS

Slav

e se

lect

or

Master

Slave

©20

20U

RZ

iTh

At

liN

tilU

iit

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

S1C

SSl

ave

sele

ctor

Mas

ter

Slav

e 1

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Slav

e 2

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Slav

e 3

MIS

O

MO

SI

SCK CS

MIS

O

MO

SI

SCK CS

S2 S3

Full

du

ple

x w

ith

1

ou

t of x

sla

ves

526

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

526

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Seri

al P

erip

hera

l Int

erfa

ce (S

PI)

Full

Du

ple

x, 4

-wir

e, fl

exib

le c

lock

rat

e

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

NSS

CS

Slav

e se

lect

or

Master

Slave

523

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

523

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

6: P

rese

ntat

ion

Laye

r

• Se

rvic

e: P

rovi

sio

n o

f pla

tfo

rm in

dep

end

ent c

od

ing

and

en

cryp

tio

n

• Fu

nct

ion

s: C

od

e co

nve

rsio

n, e

ncr

ypti

on

, vir

tual

dev

ices

• Ex

amp

les:

ISO

co

de

con

vers

ion

, PG

P en

cryp

tio

n

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

530

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

530

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

(SP

I)R

ecei

ve s

hift

regi

ster

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

NSS

CS

Slav

e se

lect

or

Master

Slave

©20

20U

RZ

iTh

At

liN

tilU

iit

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

S1C

SSl

ave

sele

ctor

Mas

ter

Slav

e 1

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Slav

e 2

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Slav

e 3

MIS

O

MO

SI

SCK CS

MIS

O

MO

SI

SCK CS

S2 S3

Co

ncu

rren

t sim

ple

x w

ith

y o

ut o

f x s

lave

s

527

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

527

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Seri

al P

erip

hera

l Int

erfa

ce (S

PI)

©20

20U

RZ

iTh

At

liN

tilU

iit

527

f758

(h

t8

8

MISO

MOSI

SCK

CS

time

SetSa

mp

le Set

Set

Set

Set

Set

Set

Set

Sam

ple

Sam

ple

Sam

ple

Sam

ple

Sam

ple

Sam

ple

Sam

ple

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

NSS

CS

Slav

e se

lect

or

Master

Slave

Clo

ck p

has

e an

d

po

lari

ty n

eed

to

be

agre

ed u

po

n

524

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

524

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

7: A

pp

licat

ion

Laye

r

• Se

rvic

e: N

etw

ork

acc

ess

for

app

licat

ion

pro

gram

s

• Fu

nct

ion

s: A

pp

licat

ion

/OS

spec

ific

• Ex

amp

les:

API

s fo

r m

ail,

ftp

, ssh

, scp

, dis

cove

ry p

roto

cols

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

531

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

531

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

(SP

I)R

ecei

ve s

hift

regi

ster

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

NSS

CS

Slav

e se

lect

or

Master

Slave

©20

20U

RZ

iTh

At

liN

tilU

iit

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

NSS

CS

Slav

e se

lect

or

Mas

ter

Slav

e 1

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Slav

e 2

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

r

Slav

e 3

MIS

O

MO

SI

SCK CS

MIS

O

MO

SI

SCK CS

Co

ncu

rren

t d

aisy

ch

ain

ing

wit

h a

ll sl

aves

528

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

528

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

(SP

I)

Seri

al P

erip

hera

l Int

erfa

ce (S

PI)

pte

p(

)R

ecei

ve s

hift

regi

ster

Tran

smit

shift

reg

iste

r

Clo

ck g

ener

ator

Rec

eive

shi

ft re

gist

er

Tran

smit

shift

reg

iste

rM

ISO

MIS

O

MO

SIM

OSI

SCK

SCK

NSS

CS

Slav

e se

lect

or

Master

Slave

fro

m S

TM32

L4x6

ad

van

ced

AR

M®-

bas

ed 3

2-b

it M

CU

s re

fere

nce

man

ual

: Fig

ure

420

on

pag

e 12

91

1 sh

ift r

egis

ter?

FIFO

s?

Dat

a co

nn

ecte

d to

an

inte

rnal

bu

s?

CR

C?

DM

A?

Spee

d?

525

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

525

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Seri

al P

erip

hera

l Int

erfa

ce (S

PI)

Use

d b

y ga

zilli

on

s o

f dev

ices

… a

nd

it

’s n

ot e

ven

a fo

rmal

sta

nd

ard

!

Sp

eed

on

ly li

mit

ed b

y w

hat

b

oth

sid

es c

an s

urv

ive.

Usu

ally

pu

sh-p

ull

dri

vers

, i.e

. fas

t an

d r

elia

ble

, yet

no

t fri

end

ly to

wro

ng

wir

ing/

pro

gram

min

g.

1.8”

CO

LOR

TFT

LC

D d

isp

lay

fro

m A

daf

ruit

San

Dis

k m

arke

tin

g p

ho

to

538

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

538

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ethe

rnet

/ IE

EE 8

02.1

1

Wir

eles

s lo

cal a

rea

net

wo

rk (W

LAN

) dev

elo

ped

in th

e 90

’s

• Fi

rst s

tan

dar

d a

s IE

EE 8

02.1

1 in

199

7 (1

-2 M

bp

s o

ver

2.4

GH

z).

• Ty

pic

al u

sage

at 5

4 M

bp

s o

ver

2.4

GH

z ca

rrie

r at

20

MH

z b

and

wid

th.

• C

urr

ent s

tan

dar

ds

up

to 7

80 M

bp

s (8

02.1

1ac)

ove

r 5

GH

z ca

rrie

r at

160

MH

z b

and

wid

th.

• Fu

ture

sta

nd

ard

s ar

e d

esig

ned

for

up

to 1

00 G

bp

s o

ver

60 G

Hz

carr

ier.

• D

irec

t rel

atio

n to

IEEE

802

.3 a

nd

sim

ilar

OSI

laye

r as

soci

atio

n.

Car

rier

Sen

se M

ulti

ple

Acc

ess

wit

h C

ollis

ion

Avo

idan

ce (C

SMA

/CA

)

Dir

ect-

Sequ

ence

Spr

ead

Spec

trum

(D

SSS)

535

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

535

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ethe

rnet

/ IE

EE 8

02.3

Loca

l are

a n

etw

ork

(LA

N) d

evel

op

ed b

y X

ero

x in

the

70’s

• 10

Mb

ps

spec

ifica

tio

n 1

.0 b

y D

EC, I

nte

l, &

Xer

ox

in 1

980.

• Fi

rst s

tan

dar

d a

s IE

EE 8

02.3

in 1

983

(10

Mb

ps

ove

r th

ick

co-a

x ca

ble

s).

• cu

rren

tly

1 G

bp

s (8

02.3

ab) c

op

per

cab

le p

ort

s u

sed

in m

ost

des

kto

ps

and

lap

top

s.

• cu

rren

tly

stan

dar

ds

up

to 1

00 G

bp

s (I

EEE

802.

3ba

2010

).

• m

ore

than

85

% o

f cu

rren

t LA

N li

nes

wo

rld

wid

e (a

cco

rdin

g to

the

Inte

rnat

ion

al D

ata

Co

rpo

rati

on

(ID

C))

.

Car

rier

Sen

se M

ulti

ple

Acc

ess

wit

h C

ollis

ion

Det

ecti

on (C

SMA

/CD

)

532

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

532

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

IP

Net

wo

rk

Phys

ical

Use

r d

ata

Use

r d

ata

OSI

Tran

spo

rt

Ap

plic

atio

n

TCP/

IPO

SI

539

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

539

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Blu

eto

oth

Wir

eles

s lo

cal a

rea

net

wo

rk (W

LAN

) dev

elo

ped

in th

e 90

’s w

ith

dif

fere

nt f

eatu

res

than

802

.11:

• Lo

wer

po

wer

co

nsu

mp

tio

n.

• Sh

ort

er r

ange

s.

• Lo

wer

dat

a ra

tes

(typ

ical

ly <

1 M

bp

s).

• A

d-h

oc

net

wo

rkin

g (n

o in

fras

tru

ctu

re r

equ

ired

).

Co

mb

inat

ion

s o

f 802

.11

and

Blu

eto

oth

OSI

laye

rsar

e p

oss

ible

to a

chie

ve th

e re

qu

ired

feat

ure

s se

t.

536

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

536

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ethe

rnet

/ IE

EE 8

02.3

O

SI r

elat

ion

: PH

Y, M

AC

, MA

C-c

lien

t

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

OS

Ire

fere

nce

mo

de

l

Ap

plic

atio

n

Pre

se

nta

tio

n

Se

ssio

n

Tra

nsp

ort

Ne

two

rk

Da

ta lin

k

Physic

al

IEE

E 8

02

.3re

fere

nce

mo

de

l

MA

C-c

lien

t

Me

dia

Acce

ss (

MA

C)

Physic

al (P

HY

)

Up

pe

r-la

ye

rp

roto

co

ls

IEE

E 8

02

-sp

ecific

IEE

E 8

02

.3-s

pe

cific

Me

dia

-sp

ecific

533

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

533

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

Ap

ple

Talk

Fili

ng

Pro

toco

l (A

FP)

Ro

uti

ng

Tab

le

Mai

nte

nan

ce P

rot.

IP

Net

wo

rk

Phys

ical

OSI

Tran

spo

rt

Ap

plic

atio

n

TCP/

IPA

pp

leTa

lk

AT

Up

dat

e B

ased

R

ou

tin

g Pr

oto

col

AT

Tran

sact

ion

Pr

oto

col

Nam

e B

ind

ing

Pro

t.A

T Ec

ho

Pr

oto

col

AT

Dat

a St

ream

Pr

oto

col

AT

Sess

ion

Pr

oto

col

Zo

ne

Info

Pr

oto

col

Prin

ter

Acc

ess

Pro

toco

l

Dat

agra

m D

eliv

ery

Pro

toco

l (D

DP)

Ap

ple

Talk

Add

ress

Res

olu

tio

n Pr

oto

col (

AA

RP)

Eth

erTa

lk L

ink

Acc

ess

Pro

toco

lLo

calT

alk

Lin

k A

cces

s Pr

oto

col

Toke

nTa

lk L

ink

Acc

ess

Pro

toco

lFD

DIT

alk

Lin

k A

cces

s Pr

oto

col

IEEE

802

.3Lo

calT

alk

Toke

n R

ing

IEEE

802

.5FD

DI

540

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

540

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Toke

n R

ing

/ IE

EE 8

02.5

/

Fib

re D

istr

ibut

ed D

ata

Inte

rfac

e (F

DD

I)

• “T

oke

n R

ing

“ d

evel

op

ed b

y IB

M in

the

70’s

• IE

EE 8

02.5

sta

nd

ard

is m

od

elle

d a

fter

the

IBM

To

ken

Rin

g ar

chit

ectu

re(s

pec

ifi ca

tio

ns

are

slig

htl

y d

iffe

ren

t, b

ut b

asic

ally

co

mp

atib

le)

• IB

M T

oke

n R

ing

req

ues

ts a

re s

tar

top

olo

gy a

s w

ell a

s tw

iste

d p

air

cab

les,

wh

ile IE

EE 8

02.5

is u

nsp

ecifi

ed in

top

olo

gy a

nd

med

ium

• Fi

bre

Dis

trib

ute

d D

ata

Inte

rfac

e co

mb

ines

a to

ken

rin

g ar

chit

ectu

re

wit

h a

du

al-r

ing,

fi b

re-o

pti

cal,

ph

ysic

al n

etw

ork

.

Un

like

CSM

A/C

D, T

oken

rin

g is

det

erm

inis

tic

(wit

h r

esp

ect t

o it

s ti

min

g b

ehav

iou

r)

FD

DI i

s de

term

inis

tic

and

failu

re r

esis

tant

No

ne

of t

he

abo

ve is

cu

rren

tly

use

d in

per

form

ance

ori

ente

d a

pp

licat

ion

s.

537

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

537

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ethe

rnet

/ IE

EE 8

02.3

O

SI r

elat

ion

: PH

Y, M

AC

, MA

C-c

lien

t

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

App

licat

ion

Pres

enta

tion

Sess

ion

Tran

spor

t

Net

wor

k

Dat

a lin

k

Phys

ical

Net

wor

k

Dat

a lin

k

Phys

ical

Use

r da

taU

ser

data

OSI

Net

wor

k La

yers

80

2.3

MA

C

Physic

al m

ed

ium

-in

de

pe

nd

en

t la

yer

MA

C C

lie

nt

MII

Physic

al m

ed

ium

-d

ep

en

de

nt

laye

rs

MD

I

80

2.3

MA

C

Physic

al m

ed

ium

-in

de

pe

nd

en

t la

yer

MA

C C

lie

nt

MII

Physic

al m

ed

ium

-d

ep

en

de

nt

laye

rs

MD

I

PH

Y

Lin

k m

ed

ia,

sig

na

l e

nco

din

g,

an

dtr

an

sm

issio

n r

ate

Tra

nsm

issio

n r

ate

MII

= M

ed

ium

-in

de

pe

nd

en

t in

terf

ace

MD

I =

Me

diu

m-d

ep

en

de

nt

inte

rfa

ce

- t

he

lin

k c

on

ne

cto

r

Lin

k

534

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

534

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

IP

Net

wo

rk

Phys

ical

OSI

Ap

ple

Talk

ove

r IP

Eth

erTa

lk L

ink

Acc

ess

Pro

toco

lLo

calT

alk

Lin

k A

cces

s Pr

oto

col

Toke

nTa

lk L

ink

Acc

ess

Pro

toco

lFD

DIT

alk

Lin

k A

cces

s Pr

oto

col

IEEE

802

.3Lo

calT

alk

Toke

n R

ing

IEEE

802

.5FD

DI

Ap

ple

Talk

Fili

ng

Pro

toco

l (A

FP)

Ro

uti

ng

Tab

le

Mai

nte

nan

ce P

rot.

AT

Up

dat

e B

ased

Ro

uti

ng

Pro

toco

lA

T Tr

ansa

ctio

n

Pro

toco

lN

ame

Bin

din

g Pr

oto

col

AT

Ech

o

Pro

toco

l

AT

Dat

a St

ream

Pro

toco

lA

T Se

ssio

n P

roto

col

Zo

ne

Info

Pro

toco

lPr

inte

r A

cces

s Pr

oto

col

Dat

agra

m D

eliv

ery

Pro

toco

l (D

DP)

Ap

ple

Talk

Add

ress

Res

olu

tio

n Pr

oto

col (

AA

RP)

547

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

547

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Dis

trib

uted

Sys

tem

s

Som

e co

mm

on

phe

nom

ena

in d

istr

ibut

ed s

yste

ms

1. U

npre

dict

able

del

ays

(co

mm

un

icat

ion

) A

re w

e d

on

e ye

t?

2. M

issi

ng o

r im

prec

ise

tim

e-ba

se C

ausa

l rel

atio

n o

r te

mp

ora

l rel

atio

n?

3. P

arti

al fa

ilure

s L

ikel

iho

od

of i

nd

ivid

ual

failu

res

incr

ease

s

Lik

elih

oo

d o

f co

mp

lete

failu

re d

ecre

ases

(in

cas

e o

f a g

oo

d d

esig

n)

544

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

544

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Dis

trib

uted

Sys

tem

s

Dis

trib

utio

n!

Mo

tiva

tio

nPo

ssib

ly …

… fi

ts a

n e

xist

ing

phys

ical

dis

trib

utio

n (e

-mai

l sys

tem

, dev

ices

in a

larg

e cr

aft,

…).

… h

igh

perf

orm

ance

du

e to

po

ten

tial

ly h

igh

deg

ree

of p

aral

lel p

roce

ssin

g.

… h

igh

relia

bilit

y/in

tegr

ity

du

e to

red

un

dan

cy o

f har

dw

are

and

so

ftw

are.

… s

cala

ble.

… in

tegr

atio

n o

f h

eter

oge

neo

us

dev

ices

.

Dif

fere

nt s

pec

ifi ca

tio

ns

will

lead

to s

ub

stan

tial

ly d

iffe

ren

t dis

trib

ute

d d

esig

ns.

541

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

541

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Fib

re C

hann

el

• D

evel

op

ed in

the

late

80’

s.

• A

NSI

sta

nd

ard

sin

ce 1

994.

• C

urr

ent s

tan

dar

ds

allo

w fo

r 16

Gb

ps

per

lin

k.

• A

llow

s fo

r th

ree

dif

fere

nt t

op

olo

gies

:

Poi

nt-t

o-po

int:

2 ad

dre

sses

Arb

itra

ted

loop

(sim

ilar

to to

ken

rin

g): 1

27 a

dd

ress

es

det

erm

inis

tic,

rea

l-ti

me

cap

able

Sw

itch

ed fa

bric

: 224

ad

dre

sses

, man

y to

po

logi

es a

nd

co

ncu

rren

t dat

a lin

ks p

oss

ible

• D

efi n

es O

SI e

qu

ival

ent l

ayer

s u

p to

the

sess

ion

leve

l.

Mo

stly

use

d in

sto

rage

arr

ays,

b

ut a

pp

licab

le to

su

per

-co

mp

ute

rs a

nd

hig

h in

tegr

ity

syst

ems

as w

ell.

548

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

548

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Dis

trib

uted

Sys

tem

s

Tim

e in

dis

trib

uted

sys

tem

s

Two

alt

ern

ativ

e st

rate

gies

:

Bas

ed o

n a

shar

ed t

ime

Syn

chro

nize

clo

cks!

Bas

ed o

n se

que

nce

of e

vent

s C

reat

e a

virt

ual t

ime!

545

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

545

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Dis

trib

uted

Sys

tem

s

Wha

t ca

n b

e d

istr

ibut

ed?

• St

ate

C

om

mo

n o

per

atio

ns

on

dis

trib

ute

d d

ata

• Fu

ncti

on

Dis

trib

ute

d o

per

atio

ns

on

cen

tral

dat

a

• St

ate

& F

unct

ion

C

lien

t/se

rver

clu

ster

s

• no

ne o

f tho

se

Pu

re r

eplic

atio

n, r

edu

nd

ancy

542

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

542

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Fib

re C

hann

elM

app

ing

of F

ibre

Ch

ann

el to

OSI

laye

rs:

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

Ap

plic

atio

n

Pres

enta

tio

n

Sess

ion

Tran

spo

rt

Net

wo

rk

Dat

a lin

k

Phys

ical

IP

Phys

ical

Use

r d

ata

Use

r d

ata

OSI

TCP/

IPO

SI

IP

Phys

ical

Ap

plic

atio

n

FC/I

P

FC-0

Ap

plic

atio

n

Fib

reC

hann

el

FC-4

FC-4

FC

-3FC

-2FC

-3

FC-2

FC-1

Tran

spo

rtTr

ansp

ort

Net

wo

rkN

etw

ork

Ap

plic

atio

n

FC-3

Co

mm

on

se

rvic

e

FC-4

Pro

toco

l map

pin

g

FC-2

Ne

two

rk

FC-0

Ph

ysic

al

FC-1

Dat

a li

nk

549

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

549

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Dis

trib

uted

Sys

tem

s

‘Rea

l-ti

me’

clo

cks

are:

• di

scre

te –

i.e.

tim

e is

no

t den

se a

nd

ther

e is

a m

inim

al g

ran

ula

rity

• dr

ift a

ffec

ted:

Max

imal

clo

ck d

rift

d d

efi n

ed a

s:

()

()

Ct

Ct

-1

11

21

21

##

dd

++

-t

t-

^^

hh

oft

en s

pec

ifi ed

as

PPM

(Par

ts-P

er-M

illio

n)

(typ

ical

20

. P

PM in

co

mp

ute

r ap

plic

atio

ns)

©20

20U

we

RZ

imm

erTh

eA

ustr

alia

nN

atio

nalU

nive

rsi

t 're

al-t

ime'

1

1

idea

l clo

ck

d

C 'm

easu

red

tim

e'

1-(1

+d

)-1real

clo

ck

546

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

546

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Dis

trib

uted

Sys

tem

s

Co

mm

on

des

ign

crit

eria

Ach

ieve

De-

coup

ling

/ hig

h d

egre

e o

f lo

cal a

uto

no

my

Coo

pera

tion

rat

her

than

cen

tral

co

ntr

ol

Co

nsi

der

Rel

iabi

lity

Co

nsi

der

Sca

labi

lity

Co

nsi

der

Per

form

ance

543

Dis

trib

ute

d S

yste

ms

© 2

020

Uw

e R

. Zim

mer

, The

Aus

tral

ian

Nat

iona

l Uni

vers

ity

page

543

of 7

58 (c

hapt

er 8

: “D

istr

ibut

ed S

yste

ms”

up

to p

age

641)

Net

wor

k pr

otoc

ols

& s

tand

ards

Infi n

iBan

d

• D

evel

op

ed in

the

late

90’

s

• D

efi n

ed b

y th

e In

fi n

iBan

d T

rad

e A

sso

ciat

ion

(IB

TA) s

ince

199

9.

• C

urr

ent s

tan

dar

ds

allo

w fo

r 25

Gb

ps

per

lin

k.

• Sw

itch

ed fa

bri

c to

po

logi

es.

• C

on

curr

ent d

ata

links

po

ssib

le (c

om

mo

nly

up

to 1

2 3

00 G

bp

s).

• D

efi n

es o

nly

the

dat

a-lin

k la

yer a

nd

par

ts o

f th

e n

etw

ork

laye

r.

• Ex

isti

ng

dev

ices

use

co

pp

er c

able

s (i

nst

ead

of o

pti

cal fi

bre

s).

Mo

stly

use

d in

su

per

-co

mp

ute

rs a

nd

clu

ster

s b

ut a

pp

licab

le to

sto

rage

arr

ays

as w

ell.

Ch

eap

er th

an E

ther

net

or

Fib

reC

han

nel

at h

igh

dat

a-ra

tes.

Sm

all p

acke

ts (o

nly

up

to 4

kB

) an

d n

o s

essi

on

co

ntr

ol.

556

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 556 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Virtual (logical) time

( ) ( )a b C a C b<" &

Implications:

( ) ( ) ( )C a C b b a< & "J

( ) ( )C a C b a b& z=

( ) ( ) ( ) ?C a C b C c< &=

( ) ( ) ( ) ?C a C b C c< < &

553

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 553 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed critical regions with synchronized clocks

Analysis• No deadlock, no individual starvation, no livelock.

• Minimal request delay: L2 .

• Minimal release delay: L.

• Communications requirements per request: N2 1-^ h messages(can be signifi cantly improved by employing broadcast mechanisms).

• Clock drifts affect fairness, but not integrity of the critical region.

Assumptions: • L is known and constant violation leads to loss of mutual exclusion.

• No messages are lost violation leads to loss of mutual exclusion.

550

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 550 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Synchronize a ‘real-time’ clock (bi-directional)

Resetting the clock drift by regular reference time re-synchronization:

Maximal clock drift d defi ned as:

( ) ( )C t C t-1 11

2 12 1# #d d+ +-t t-^ ^h h

‘real-time’ clock is adjusted forwards & backwards

Calendar timet 'real-time'

C 'measured time'

sync.sync.sync.

ref. time

ref. time

ref. time

real clock

idealclock

557

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 557 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Virtual (logical) time

( ) ( )a b C a C b<" &

Implications:

( ) ( ) ( ) ( ) ( )C a C b b a a b a b< & " " 0J z=

( ) ( )C a C b a b a b b a& " "/J Jz= = ^ ^h h

( ) ( ) ( ) ?C a C b C c< &=

( ) ( ) ( ) ?C a C b C c< < &

554

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 554 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Virtual (logical) time [Lamport 1978]

( ) ( )a b C a C b<" &

with a b" being a causal relation between a and b,and ( )C a , ( )C b are the (virtual) times associated with a and b

a b" iff:• a happens earlier than b in the same sequential control-fl ow or

• a denotes the sending event of message m, while b denotes the receiving event of the same message m or

• there is a transitive causal relation between a and b: a e e bn1" " " "f

Notion of concurrency:

a b a b b a& " "/J Jz ^ ^h h

551

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 551 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Synchronize a ‘real-time’ clock (forward only)

Resetting the clock drift by regular reference time re-synchronization:

Maximal clock drift d defi ned as:

( ) ( )C t C t-1 11

2 12 1# #d+ -t t-^ h

‘real-time’ clock is adjusted forwards only

Monotonic timet 'real-time'

C 'measured time'

sync.sync.sync.

ref. time

ref. time

ref. time

idealclock

558

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 558 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Virtual (logical) time

( ) ( )a b C a C b<" &

Implications:

( ) ( ) ( ) ( ) ( )C a C b b a a b a b< & " " 0J z=

( ) ( )C a C b a b a b b a& " "/J Jz= = ^ ^h h

( ) ( ) ( )C a C b C c c a< & "J= ^ h

( ) ( ) ( )C a C b C c c a< < & "J^ h

555

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 555 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Virtual (logical) time

( ) ( )a b C a C b<" &

Implications:

( ) ( ) ?C a C b< &

( ) ( ) ?C a C b &=

( ) ( ) ( ) ?C a C b C c< &=

( ) ( ) ( ) ?C a C b C c< < &

552

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 552 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed critical regions with synchronized clocks• 6 times: 6 received Requests: Add to local RequestQueue (ordered by time)6 received Release messages: Delete corresponding Requests in local RequestQueue

1. Create OwnRequest and attach current time-stamp.Add OwnRequest to local RequestQueue (ordered by time). Send OwnRequest to all processes.

2. Delay by L2 (L being the time it takes for a message to reach all network nodes)

3. While Top (RequestQueue) ≠ OwnRequest: delay until new message

4. Enter and leave critical region

5. Send Release-message to all processes.

565

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 565 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed critical regions with a central coordinator

A global, static, central coordinator Invalidates the idea of a distributed system

Enables a very simple mutual exclusion scheme

Therefore:

• A global, central coordinator is employed in some systems … yet …

• … if it fails, a system to come up with a new coordinator is provided.

562

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 562 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed critical regions with logical clocks• 6 times: 6 received Requests:

Add to local RequestQueue (ordered by time) Reply with Acknowledge or OwnRequest

• 6 times: 6 received Release messages: Delete corresponding Requests in local RequestQueue

1. Create OwnRequest and attach current time-stamp. Add OwnRequest to local RequestQueue (ordered by time). Send OwnRequest to all processes.

2. Wait for Top (RequestQueue) = OwnRequest & no outstanding replies3. Enter and leave critical region4. Send Release-message to all processes.

559

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 559 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Virtual (logical) time

( ) ( )a b C a C b<" &

Implications:

( ) ( ) ( ) ( ) ( )C a C b b a a b a b< & " " 0J z=

( ) ( )C a C b a b a b b a& " "/J Jz= = ^ ^h h

( ) ( ) ( ) ( ) ( )C a C b C c c a a c a c< & " " 0J z= =^ h

( ) ( ) ( ) ( ) ( )C a C b C c c a a c a c< < & " " 0J z=^ h

566

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 566 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Electing a central coordinator (the Bully algorithm)Any process P which notices that the central coordinator is gone, performs:

1. P sends an Election-message to all processes with higher process numbers.

2. P waits for response messages. If no one responds after a pre-defined amount of time: P declares itself the new coordinator and sends out a Coordinator-message to all.

If any process responds, then the election activity for P is over and P waits for a Coordinator-message

All processes Pi perform at all times:

• If Pi receives a Election-message from a process with a lower process number, it responds to the originating process and starts an election process itself (if not running already).

563

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 563 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed critical regions with logical clocks

Analysis• No deadlock, no individual starvation, no livelock.

• Minimal request delay: N 1- requests (1 broadcast) + N 1- replies.

• Minimal release delay: N 1- release messages (or 1 broadcast).

• Communications requirements per request: N3 1-^ h messages (or N 1- messages + 2 broadcasts).

• Clocks are kept recent by the exchanged messages themselves.

Assumptions: • No messages are lost violation leads to stall.

560

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 560 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Virtual (logical) time

Time as derived from causal relations:

25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

P3

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

Message

20 22

26

22 233333 24

227

30

4

27

2242

29

222225

8

22222222222222222222225

29292929229229292929222292929292929299299

9

26 27227272727272227272727777777777777 30

3

26 27272772722727777

0

2 3333

31

36

36

31

3 33333333333

35

3

35

333837

33

343333333333333

3535

444343433334 35

Events in concurrent control fl ows are not ordered.

No global order of time.

567

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 567 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states How to read the current state of a distributed system?

25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

P3

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

Message

20 22

26

22 23333 24

227

30

4

27

224

29

222225

8

22222222222222222222225

29292929222222292292292292929292929292292999999

9

26 727272727272272272727277777777777 30

3

26 272227277277272777777

0

2 333333

31

36

36

31

3 3333333333

35

3

35

3837

33

343333333333333

3535

44434333433434 35

This “god’s eye view” does in fact not exist.

564

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 564 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed critical regions with a token ring structure

1. Organize all processes in a logical or physical ring topology

2. Send one token message to one process

3. 6 times, 6processes: On receiving the token message:1. If required the process enters and leaves a critical section (while holding the token).2. The token is passed along to the next process in the ring.

Assumptions: • Token is not lost violation leads to stall.

(a lost token can be recovered by a number of means – e.g. the ‘election’ scheme following)

561

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 561 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Implementing a virtual (logical) time

1. :P C 0i i6 =

2. :Pi6

6 local events: C C 1i i= + ;6 send events: C C 1i i= + ; Send (message, Ci);6 receive events: Receive (message, Cm); ( , )maxC C C 1i i m= + ;

574

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 574 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states Running the snapshot algorithm:

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

22

26

22 2333 24

227

330

4

27

224

2929229229229

222225

8

2222222222222222225

292922292222929222292292922929292929222929999999

9

26 727272727272727222727272777777777

3

26 2722727727272727777777

0

0

2 333333

31

31 36

363 3333333333

35

3

35

33837

22

21

220

33

343333333333333

3535P1

P2

P1

4443433333434 35

P0

3

330

3 31

0

P3

0

303033

12

• Observer-process P0 (any process) creates a snapshot token ts and saves its local state s0.

• P0 sends ts to all other processes.

571

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 571 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states How to read the current state of a distributed system?

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

20 22

26

22 23333 24

227

330

4

27

224

29

2222225

8

2222222222222222225

2929292992992292929292922229292292929292929229

9

26 727272727272727222727272777777777

0

26 2722727727272727777777

0

2 333333333333333333333333333333333333333333333333 34343433433343434343433434334343434343344

31

31 35

37

33333333333

44444444

35

338

35

34

6

333

36

33333333333333333333333333333333333333333333

333333353533353335353535353535355555

P0

32

30

330

30

12

33337

444440

38888888

Instead: some entity probes and collects local states. What state of the global system has been accumulated?

Sorting the events into past and future events.

568

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 568 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states How to read the current state of a distributed system?

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

22

26

22 23333 24

227

330

4

27

224

29

2222225

8

2222222222222222225

2929292992992292929292922229292292929292929229

9

26 272727272722727272227277777777777726 2722727727272727727777777

6 33837

P0

P3 25

21

2

P1

P2 20

444440363322 3333333 3333333333

33

4443433333434 353353335353

3030

31330

3

3

333

3333333333333333333

313313331303033

12

36

35 36

39

37373737337373737373737337373337337337337373737 336

6

35

3837

33

3434343434343434343434434343434343434343434343344444 33337

444440

8838885

3633

55

33

353535

Instead: some entity probes and collects local states. What state of the global system has been accumulated?

575

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 575 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states Running the snapshot algorithm:

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

22

26

22 2333 24

227

4

33027

2224

29

22225

8

292929929922292922922222922292929292922929999

22222222222222225

9

26 272727272727272272277777777777726 2727272727272227227277777777

2 333333

6

363 33333333333 3

35

33837

33

444443434333343444 35

P0

3

220

3030

31330

3

33

313131

P0

21

220

P1

P2

3636555555 33

3433333333

55555333333333555533333333353335353535353535333333303033

12

• Pi6 which receive ts (as an individual token-message, or as part of another message):

• Save local state si and send si to P0.

• Attach ts to all further messages, which are to be sent to other processes.

• Save ts and ignore all further incoming ts‘s.

572

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 572 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states How to read the current state of a distributed system?

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

20 22

26

22 233333 24

2227

330

4

27

224

29

2222225

8

22222222222222222222225

292929299292229292929292929292929229299999

9

26 72727272722727272727277777777777

0

26 27272772727277222777777

0

2 3333333333333333333333333333333333333333333333 343434343343433434343434343433343434344

31

31 35

3744444444

35

38

35

34

6

3333333333333333333333333333333333333333

363333335353333335353535335555

P0

32

30

330

30

12

333337

4444440

38888888

Instead: some entity probes and collects local states. What state of the global system has been accumulated?

Event in the past receives a message from the future!Division not possible Snapshot inconsistent!

569

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 569 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states How to read the current state of a distributed system?

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

20 22

26

22 23333 24

227

4

33027

224

29

222225

8

22222222222222225

292929299299229292929292222929229292929292929

9

26 2727272727227227227777777777777

3

26 27727277277272227777777

0

2 333333

31

36

36

31

3 3

35

3

35

33837

33

3433333333333

3535

4344 35

P0

3

3

330

3

3

333337

4444440

38888888

12

3131

33333333333333444443434333434444444 35353 3

Instead: some entity probes and collects local states. What state of the global system has been accumulated?

Connecting all the states to a global state.

576

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 576 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states Running the snapshot algorithm:

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

22

26

22 233 24

227

30

4

27

224

29

222225

8

22222222222222222222225

29292929222222292292292292929292929292292999999

9

26 72727272727227227272727777777777726 272227277277272777777

2 333333

6

363 3333333333 3

35

3837

33

44434333433434 35

P0

3

220

3030

31330

3

33

313131

P0

21

220

P1

P2

36365555 33

3433333333333

55555533333335553333333333353335353535335353333333303033

12

• Pi6 which previously received ts and receive a message m without ts:

• Forward m to P0 (this message belongs to the snapshot).

573

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 573 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Snapshot algorithm

• Observer-process P0 (any process) creates a snapshot token ts and saves its local state s0.

• P0 sends ts to all other processes.

• Pi6 which receive ts (as an individual token-message, or as part of another message):

• Save local state si and send si to P0.

• Attach ts to all further messages, which are to be sent to other processes.

• Save ts and ignore all further incoming ts‘s.

• Pi6 which previously received ts and receive a message m without ts:

• Forward m to P0 (this message belongs to the snapshot).

570

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 570 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states

A consistent global state (snapshot) is defi ne by a unique division into:

• “The Past” P (events before the snapshot):( ) ( )e P e e e P2 1 2 1" &/! !

• “The Future” F (events after the snapshot):( ) ( )e F e e e F1 1 2 2" &/! !

583

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 583 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

A distributed server (load balancing)

ServerClient ServerClient

580

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 580 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states Running the snapshot algorithm:

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

26

2333 24

227

330

4

27

224

29

2222225

8

2222222222222222225

2929292992992292929292922229292292929292929229

9

26 72727272727272722272727277777777726 2722727727272727777777

6 33837

P0

1

220 2222

1

35 36

396

35

33837

33337

444440P3

P2 34333333333

P

666666666666666666666666666666363333666666666666663363636363663636363636663633633666333333333332P1

P0

37 38838883333333333

3635 3

3133313

363555555 3

31 5555555533333333355535333333333335333535353353533333

322 3333333 3333333333

33

444343433433343

3030

31330

3

3333333333333333333

3333333333333333

0

303033

12

Sorting the events into past and future events.

Past and future events uniquely separated Consistent state

577

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 577 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states Running the snapshot algorithm:

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

20 22

26

22 23333 24

227

330

4

27

224

P3

P1

P2

29

2222225

8

2222222222222222225

2929292992992292929292922229292292929292929229

9

26 272727272722727272227277777777777726 2722727727272727727777777

2 333333

0 5

363 3333333333 3 33837

33

54443433333434

P0

3

P0

3030

31330

3 3131

33 35 36

39

7 3

363

35

338375

31

P

34333333333

5555555533333333355535333333333335333535353353533333 666666666666 366666666666666666663633336666666666666633636363636636336363666336363633666333333333333

333333333333333333

303033

12

• Pi6 which receive ts (as an individual token-message, or as part of another message):

• Save local state si and send si to P0.

• Attach ts to all further messages, which are to be sent to other processes.

• Save ts and ignore all further incoming ts‘s.

584

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 584 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

A distributed server (load balancing)

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client Ring of servers

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client Ring of servers

581

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 581 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Snapshot algorithm

Termination condition?

Either

• Make assumptions about the communication delays in the system.

or

• Count the sent and received messages for each process (include this in the lo-cal state) and keep track of outstanding messages in the observer process.

578

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 578 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states Running the snapshot algorithm:

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

0

24

330

4

27

224

29

222225

8

22222222222222225

292929299299229292929292222929229292929292929

9

26 727277272727272727227227277777777726 27272727272727272727727222277777777

P 25

2P2

P3

20 22

26

22 23333

6 33837

P0

35 36

39

7 3

6

35

33837

34333333333

666666666666 366666666666666636333666666666666666336363636366363636366333663666663333333333332822721P1

P00

3635 3363555555 3

313313331 555555555533333333555333333333353535353533353353533333

322 3333333 3333333333

33

4443434333343443

3030

31330

3

33333333333333333333

33333333333333

303033

12

• Save ts and ignore all further incoming ts‘s.

585

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 585 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

A distributed server (load balancing)

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Send_To_Group (Job) Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Send_To_Group (JTT ob)

582

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 582 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Consistent distributed statesWhy would we need that?

• Find deadlocks.

• Find termination / completion conditions.

• … any other global safety of liveness property.

• Collect a consistent system state for system backup/restore.

• Collect a consistent system state for further pro-cessing (e.g. distributed databases).

• …

579

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 579 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed states Running the snapshot algorithm:

P3 25

time0 5 10 15 20 25 30 35 40 45

21

20

P1

26 27 29

22

29

P2

31 35 36

23 24 25 26 27 30 31 33 34 35 36 37

30 31 32 33 34 35 36 37 4038 39

27 28 30 37 38

20 22

6

22

P1

233 24

227

25

5

26 30

4

27

224

29

222225

8

22222222222222222222225

2929292929292929222229292922292929292292922299

9

26 77272727727272727272272272277777777726 2727227222727272727727272277777

6

P1

3837

P0

PP

P

35 36

396

35

3837

333337

4444440

343333333333

P3 2

1

P2

P

P

PP

6666666666666666666666666666666666666666666633636363666363636366363633666663333333333333333363333

0

37 3388888333333333333

3635 3363555555 3

313313331 5555555553333333555333333333335333535353533535333333

322 3333333 3333333333

33

444343334334343

3030

31330

3

33333333333333333

33333333333333

303033

12

• Finalize snapshot

592

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 592 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

TransactionsDefi nition (ACID properties):

• Atomicity: All or none of the sub-operations are performed. Atomicity helps achieve crash resilience. If a crash occurs, then it is possible to roll back the system to the state before the transaction was invoked.

• Consistency: Transforms the system from one consistent state to another consistent state.

• Isolation: Results (including partial results) are not revealed unless and until the transaction commits. If the operation accesses a shared data object, invocation does not interfere with other operations on the same object.

• Durability: After a commit, results are guaranteed to persist, even after a subsequent system failure.

589

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 589 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

A distributed server (load balancing)task body Print_Server is begin loop select

accept Send_To_Server (Print_Job : in Job_Type; Job_Done : out Boolean) do

if not Print_Job in Turned_Down_Jobs then

if Not_Too_Busy then Applied_For_Jobs := Applied_For_Jobs + Print_Job; Next_Server_On_Ring.Contention (Print_Job, Current_Task); requeue Internal_Print_Server.Print_Job_Queue;

else Turned_Down_Jobs := Turned_Down_Jobs + Print_Job; end if;

end if; end Send_To_Server;

(...)

586

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 586 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

A distributed server (load balancing)

Server

Server

Server

Server

Server Server

Server

Server

Server

Ser

Client Contention messages

Server

erver

Server

Server

Server erver

ver

Server

Serve

Ser

errrr SSSSSS

verServ

rvver

Se r Se

SerSeSS v

Server

rver

Client Contention messages

593

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 593 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

TransactionsDefi nition (ACID properties):

• Atomicity: All or none of the sub-operations are performed. Atomicity helps achieve crash resilience. If a crash occurs, then it is possible to roll back the system to the state before the transaction was invoked.

• Consistency: Transforms the system from one consistent state to another consistent state.

• Isolation: Results (including partial results) are not revealed unless and until the transaction commits. If the operation accesses a shared data object, invocation does not interfere with other operations on the same object.

• Durability: After a commit, results are guaranteed to persist, even after a subsequent system failure.

isis possp ibleisiii possibliblbb

How to ensure consistency

in a distributed system?

Actual isolation and

effi cient concurrency?

Shadow copies?

Actual isolation or the appearance of isolation?

sub operations are performedsub operatioti ns are perfperffffffformeormermermedddddddddd

Atomic operations spanning multiple processes?

What hardware do we

need to assume?

590

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 590 of 758 (chapter 8: “Distributed Systems” up to page 641)

or accept Contention (Print_Job : in Job_Type; Server_Id : in Task_Id) do if Print_Job in AppliedForJobs then if Server_Id = Current_Task then Internal_Print_Server.Start_Print (Print_Job); elsif Server_Id > Current_Task then Internal_Print_Server.Cancel_Print (Print_Job); Next_Server_On_Ring.Contention (Print_Job; Server_Id); else null; -- removing the contention message from ring end if; else Turned_Down_Jobs := Turned_Down_Jobs + Print_Job; Next_Server_On_Ring.Contention (Print_Job; Server_Id); end if; end Contention; or terminate; end select; end loop; end Print_Server;

587

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 587 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

A distributed server (load balancing)

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Job_Completed (Results)

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Job_Completed (Results)

594

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 594 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transactions

A closer look inside transactions:

• Transactions consist of a sequence of operations.

• If two operations out of two transactions can be performed in any order with the same fi nal effect, they are commutative and not critical for our purposes.

• Idempotent and side-effect free operations are by defi nition commutative.

• All non-commutative operations are considered critical operations.

• Two critical operations as part of two different transactions while affecting the same object are called a confl icting pair of operations.

591

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 591 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transactions

Concurrency and distribution in systems with multiple, interdependent interactions?

Concurrent and distributedclient/server interactions

beyond single remote procedure calls?

588

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 588 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

A distributed server (load balancing)with Ada.Task_Identification; use Ada.Task_Identification;

task type Print_Server is

entry Send_To_Server (Print_Job : in Job_Type; Job_Done : out Boolean); entry Contention (Print_Job : in Job_Type; Server_Id : in Task_Id);

end Print_Server;

601

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 601 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Read (C)

Order

Write (B)ead (eaeaead ((C)Write (A) RReRRR

ead (ad (d (d (d (dd (d ((((((((((((A)A)A)A)A)A)A)A)A)AA)A)A)AA)A)A)A)AAA)A))AA))))) WR

P1 P2P3

• Three confl icting pairs of operations with the same order of execution (pair-wise between processes).

• The order between processes also leads to a global order of processes.

Serializable

598

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 598 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Write (B)

ead (((((((A)A)A)A)A)A)AAA)A)A)AA)A)A)A)))) WR

P1 P2

Serializable

595

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 595 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transactions

A closer look at multiple transactions:

• Any sequential execution of multiple transactions will fulfi l the ACID-properties, by defi nition of a single transaction.

• A concurrent execution (or ‘interleavings’) of multiple transactions might fulfi l the ACID-properties.

If a specifi c concurrent execution can be shown to be equivalent to a specifi c sequential execution of the involved transactions then this specifi c interleaving is called ‘serializable’.

If a concurrent execution (‘interleaving’) ensures that no transaction ever encounters an inconsistent state then it is said to ensure the appearance of isolation.

602

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 602 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Read (C)

Order

Re

Re

W

• Three confl icting pairs of operations with the same order of execution (pair-wise between processes).

• The order between processes also leads to a global order of processes.

Serializable

599

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 599 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)P1

Write (C)

Read (A) Write (B)P2

P3

Write (B)

Order

Re

W

P1 P2

• Two confl icting pairs of operations with different orders of executions.

Not serializable.

596

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 596 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Achieving serializability

For the serializability of two transactions it is necessary and suffi cient for the order of their invocations

of all confl icting pairs of operations to be the same for all the objects which are invoked by both transactions.

(Determining order in distributed systems requires logical clocks.)

603

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 603 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)

Read (C)

P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Read (C)

Order

R

W

eaRe

Re

W

P1 P2 P3

• Three confl icting pairs of operations with the same order of execution (pair-wise between processes).

• The order between processes does no longer lead to a global order of processes.

Not serializable

600

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 600 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Read (C)

Order

Re

Re

W

• Three confl icting pairs of operations with the same order of execution (pair-wise between processes).

• The order between processes also leads to a global order of processes.

597

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 597 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Order

Re W

• Two confl icting pairs of operations with the same order of execution.

610

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 610 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transaction schedulers – Optimistic control

Three sequential phases:

1. Read & execute:Create a shadow copy of all involved objects and perform all required operations on the shadow copy and locally (i.e. in isolation).

2. Validate:After local commit, check all occurred interleavings for serializability.

3. Update or abort:

3a. If serializability could be ensured in step 2 then all results of involved transactions are written to all involved objects – in dependency order of the transactions.

3b. Otherwise: destroy shadow copies and start over with the failed transactions.

607

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 607 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transaction schedulers

Three major designs:

• Locking methods:Impose strict mutual exclusion on all critical sections.

• Time-stamp ordering:Note relative starting times and keep order dependencies consistent.

• “Optimistic” methods:Go ahead until a confl ict is observed – then roll back.

604

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 604 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Achieving serializability For the serializability of two transactions it is necessary and suffi cient

for the order of their invocations of all confl icting pairs of operations to be the same

for all the objects which are invoked by both transactions.

• Defi ne: Serialization graph: A directed graph; Vertices i represent transactions Ti; Edges T Ti j" represent an established global order dependency between all confl icting pairs of operations of those two transactions.

For the serializability of multiple transactions it is necessary and suffi cient

that the serialization graph is acyclic.

611

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 611 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transaction schedulers – Optimistic control

Three sequential phases:

1. Read & execute:Create a shadow copy of all involved objects and perform all required operations on the shadow copy and locally (i.e. in isolation).

2. Validate:After local commit, check all occurred interleavings for serializability.

3. Update or abort:

3a. If serializability could be ensured in step 2 then all results of involved transactions are written to all involved objects – in dependency order of the transactions.

3b. Otherwise: destroy shadow copies and start over with the failed transactions.

How to create a consistent copy?

rrrrrrrrresulesulesulesuleeseesuesulesesulesuesuleseesulesuesulesulsuu ts ots ots otts otts ots otts otts ots otts oof inf inf inf inf infff inf inff iff inf iinnnnnnnvolvvolvvolvvolvvolvvolvolvvolvvolvvolvvolvvvvvvvved ted teeded ted teeedeeded td tdd td td transransranransransransransraransaaanannnnsnnsnsnssactiactiacacacactiaactiactictictctitions ons ons ooonsonsonsooonsoonsnnsnsnsslt f i l d t tiHow to update all objects consistently?

((iiiiiiii e iiiiiiin isn isiiii olatolatl tll ion)i )i )i )i ))

Full isolation and maximal concurrency!

Aborts happen after everything has been committed locally.

608

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 608 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transaction schedulers – Locking methodsLocking methods include the possibility of deadlocks careful from here on out …

• Complete resource allocation before the start and release at the end of every transaction:

This will impose a strict sequential execution of all critical transactions.

• (Strict) two-phase locking:Each transaction follows the following two phase pattern during its operation:

• Growing phase: locks can be acquired, but not released.

• Shrinking phase: locks can be released anytime, but not acquired (two phase locking) or locks are released on commit only (strict two phase locking).

Possible deadlocks

Serializable interleavings

Strict isolation (in case of strict two-phase locking)

• Semantic locking: Allow for separate read-only and write-locks

Higher level of concurrency (see also: use of functions in protected objects)

605

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 605 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Read (C)

Order

Write (B)ead (eead (ead (ead (C)Write (A) RReRRR

ead (ad (d (dd (d (d (d (((((((((((A)A)A)A)AA)A)A)A)A)AA)A)A)A)A)AAA)AA)A)A)))))) WR

P1 P2P3

• Three confl icting pairs of operations with the same order of execution (pair-wise between processes).

Serialization graph is acyclic.

Serializable

612

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 612 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed transaction schedulersThree major designs:

• Locking methods: no abortsImpose strict mutual exclusion on all critical sections.

• Time-stamp ordering: potential aborts along the wayNote relative starting times and keep order dependencies consistent.

• “Optimistic” methods: aborts or commits at the very endGo ahead until a confl ict is observed – then roll back.

How to implement “commit” and “abort” operationsin a distributed environment?

609

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 609 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Transaction schedulers – Time stamp orderingAdd a unique time-stamp (any global order criterion) on every transaction upon start. Each involved object can inspect the time-stamps of all requesting transactions.

• Case 1: A transaction with a time-stamp later than all currently active transactions applies: the request is accepted and the transaction can go ahead.

• Alternative case 1 (strict time-stamp ordering): the request is delayed until the currently active earlier transaction has committed.

• Case 2: A transaction with a time-stamp earlier than all currently active transactions applies: the request is not accepted and the applying transaction is to be aborted.

Collision detection rather than collision avoidance No isolation Cascading aborts possible.

Simple implementation, high degree of concurrency– also in a distributed environment, as long as a global event order (time) can be supplied.

606

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 606 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Serializability

time0 5 10 15 20 25 30 35 40 45

Write (A)

Read (C)

P1

Write (C)

Read (A)

Write (B)

P2

P3

Write (B)

Read (C)

Order

R

W

eaRe

Re

W

P1 P2 P3

• Three confl icting pairs of operations with the same order of execution (pair-wise between processes).

Serialization graph is cyclic.

Not serializable

619

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 619 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolPhase 1: Determine result state

0 Uwe R. Zimmer, The Australian National University page 619 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Coordinator requests and assembles votes:"Commit" or "Abort"

616

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 616 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolStart up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 616 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Determine coordinator

613

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 613 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolStart up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 613 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

ver

rver

rver

ver

rver

Ring of servers

Data

620

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 620 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolPhase 2: Implement results

0 Uwe R. Zimmer, The Australian National University page 620 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Coordinator instructs everybody to "Commit"

617

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 617 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolStart up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 617 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Setup & Start operations

614

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 614 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolStart up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 614 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

erver

Server Server

erver

erver

ver

Client

See e SerSe

ServerSeSe

Se

Se

rver SerSSSSSSSSS vSe

rverSer

ererveSeS

ver

verrv

rver

erve

rver

Distributed Transaction

621

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 621 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolPhase 2: Implement results

0 Uwe R. Zimmer, The Australian National University page 621 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Serv

Server

Client

Server

Server

Server

Server

Server

Server

Server

Serv

Server

Client

Coord.

ver

rv

rver

ver

rver

Everybody commits

618

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 618 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolStart up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 618 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Setup & Start operations

Shadow copy

615

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 615 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolStart up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 615 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

ver

rver

rver

ver

rver

Determine coordinator

628

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 628 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Premise:

A crashing server computer should not compromise the functionality of the system(full fault tolerance)

Assumptions & Means:

• k computers inside the server cluster might crash without losing functionality.

Replication: at least k 1+ servers.

• The server cluster can reorganize any time (and specifi cally after the loss of a computer).

Hot stand-by components, dynamic server group management.

• The server is described fully by the current state and the sequence of messages received.

State machines: we have to implement consistent state adjustments (re-organization) and consistent message passing (order needs to be preserved).

[Schneider1990]

625

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 625 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolor Phase 2: Global roll back

0 Uwe R. Zimmer, The Australian National University page 625 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Everybody destroys shadows

622

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 622 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolPhase 2: Implement results

0 Uwe R. Zimmer, The Australian National University page 622 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Everybody destroys shadows

629

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 629 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)

Stages of each server:

Job message received by all active servers

Job processed locallyJob message received locally

Received Deliverable

Processed

626

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 626 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolPhase 2: Report result of distributed transaction

0 Uwe R. Zimmer, The Australian National University page 626 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Clien

Coord.

ver

rver

rver

Server

Server

ntttt

C

ver

rverCoordinator reports to client: "Committed" or "Aborted"

623

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 623 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolPhase 2: Implement results

0 Uwe R. Zimmer, The Australian National University page 623 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Everybody reports "Committed"

630

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 630 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Start-up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 630 of y 758 (chapter 8: “Distributed Systems” up to page8

Start up (initialization) phase

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client Ring of identical servers

627

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 627 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Distributed transaction schedulersEvaluating the three major design methods in a distributed environment:

• Locking methods: No aborts.Large overheads; Deadlock detection/prevention required.

• Time-stamp ordering: Potential aborts along the way.Recommends itself for distributed applications, since decisions are taken locally and communication overhead is relatively small.

• “Optimistic” methods: Aborts or commits at the very end.Maximizes concurrency, but also data replication.

Side-aspect “data replication”: large body of literature on this topic (see: distributed data-bases / operating systems / shared memory / cache management, …)

624

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 624 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Two phase commit protocolor Phase 2: Global roll back

0 Uwe R. Zimmer, The Australian National University page 624 of y 758 (chapter 8: “Distributed Systems” up to pag8

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Server

Server

Server

Server

Server

Server

Server

Server

Server

Client

Coord.

ver

rver

rver

ver

rver

Coordinator instructs everybody to "Abort"

637

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 637 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Everybody (besides coordinator) processes

0 Uwe R. Zimmer, The Australian National University page 637 of y 758 (chapter 8: “Distributed Systems” up to page8

Everybody (besides coordinator) processes

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

All server detecttwo job-messages

☞ everybody processes job

634

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 634 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Distribute job

0 Uwe R. Zimmer, The Australian National University page 634 of y 758 (chapter 8: “Distributed Systems” up to page8

Distribute job

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

Coordinator sends job both ways

631

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 631 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Start-up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 631 of y 758 (chapter 8: “Distributed Systems” up to page8

Start up (initialization) phase

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client Determine coordinator

638

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 638 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Coordinator processes

0 Uwe R. Zimmer, The Australian National University page 638 of y 758 (chapter 8: “Distributed Systems” up to page8

Coordinator processes

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

Coordinator also received two messages

and processes job

635

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 635 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Distribute job

0 Uwe R. Zimmer, The Australian National University page 635 of y 758 (chapter 8: “Distributed Systems” up to page8

Distribute job

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

Everybody received job (but nobody knows that)

632

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 632 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Start-up (initialization) phase

0 Uwe R. Zimmer, The Australian National University page 632 of y 758 (chapter 8: “Distributed Systems” up to page8

Start up (initialization) phase

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

Coordinator determined

639

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 639 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Result delivery

0 Uwe R. Zimmer, The Australian National University page 639 of y 758 (chapter 8: “Distributed Systems” up to page8

Result delivery

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

Server

Server

ntttt

C

Coordinator delivers his local result

636

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 636 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Processing starts

0 Uwe R. Zimmer, The Australian National University page 636 of y 758 (chapter 8: “Distributed Systems” up to page8

Processing starts

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

First server detects two job-messages☞ processes job

633

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 633 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)Coordinator receives job message

0 Uwe R. Zimmer, The Australian National University page 633 of y 758 (chapter 8: “Distributed Systems” up to page8

Coordinator receives job message

Server

Server

Server

Server

Server Server

Server

Server

Server

Server

Client

Coord.

Server

Server

ntttt

C

Send Job

640

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 640 of 758 (chapter 8: “Distributed Systems” up to page 641)

Distributed Systems

Redundancy (replicated servers)

Event: Server crash, new servers joining, or current servers leaving.

Server re-confi guration is triggered by a message to all (this is assumed to be supported by the distributed operating system).

Each server on reception of a re-confi guration message:

1. Wait for local job to complete or time-out.

2. Store local consistent state Si.

3. Re-organize server ring, send local state around the ring.

4. If a state Sj with j i> is received then S Si j%

5. Elect coordinator

6. Enter ‘Coordinator-’ or ‘Replicate-mode’

641

Distributed Systems

© 2020 Uwe R. Zimmer, The Australian National University page 641 of 758 (chapter 8: “Distributed Systems” up to page 641)

Summary

Distributed Systems

• Networks• OSI, topologies

• Practical network standards

• Time• Synchronized clocks, virtual (logical) times

• Distributed critical regions (synchronized, logical, token ring)

• Distributed systems• Elections

• Distributed states, consistent snapshots

• Distributed servers (replicates, distributed processing, distributed commits)

• Transactions (ACID properties, serializable interleavings, transaction schedulers)