CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.

26
CSS434 DSM 1 CSS434 Distributed Shared CSS434 Distributed Shared Memory Memory Textbook Ch18 Textbook Ch18 Professor: Munehiro Fukuda
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    231
  • download

    0

Transcript of CSS434 DSM1 CSS434 Distributed Shared Memory Textbook Ch18 Professor: Munehiro Fukuda.

CSS434 DSM 1

CSS434 Distributed Shared MemoryCSS434 Distributed Shared MemoryTextbook Ch18Textbook Ch18

Professor: Munehiro Fukuda

CSS434 DSM 2

Basic Concept

Communication Network

CPU 1

CPU n

MMUPage Mgr

: Memory

Node 0

CPU 1

CPU n

MMUPage Mgr

: Memory

Node 1

CPU 1

CPU n

MMUPage Mgr

: Memory

Node 2

Distributed Shared Memory(exists only virtually)

Data = read(address); write(address, data);

address

A cache line or a page is transferred to and cached inthe requested computer.

CSS434 DSM 3

Writer Process on DSM#include "world.h" struct shared { int a,b; };

Program Writer:main(){

int x;struct shared *p;methersetup(); /* Initialize the Mether run-time */p = (struct shared *)METHERBASE;

/* overlay structure on METHER segment */p->a = p->b = 0; /* initialize fields to zero */while(TRUE) { /* continuously update structure fields */

p –>a = p –>a + 1;p –>b = p –>b - 1;

}}

CSS434 DSM 4

Reader Process on DSM

Program Reader:main(){

struct shared *p;methersetup();p = (struct shared *)METHERBASE;while(TRUE) { /* read the fields once every second */

printf("a = %d, b = %d\n", p –>a, p –>b);sleep(1);

}}

CSS434 DSM 5

Why DSM? Simpler abstraction

Underlying tedious communication primitives are all shielded by memory accesses

Better portability of distributed application programs Natural transition from sequential to distributed application

Better performance of some applications Data locality, one-demand data movement, and large memory

space reduce network traffic and paging/swapping activities. Flexible communication environment

Sender and receiver have no need to know each other. They even need not coexist.

Ease of process migration Migration is completed only by transferring the corresponding

PCB to the destination.

CSS434 DSM 6

Main Issues Granularity

Fine (less false sharing but more network traffic) Cache line (e.g. Dash and Alewife), Object (e.g. Orca and Linda), Page (e.g. Ivy) Coarse(more false sharing but less network traffice)

Memory coherence and access synchronization Strict, Sequential, Causal, Weak, and Release Consistency models

Data location and access Broadcasting, centralized data locator, fixed distributed data locator, and

dynamic distributed data locator Replacement strategy

LRU or FIFO (The same issue as OS virtual memory) Thrashing

How to prevent a block from being exchanged back and forth between two nodes.

Heterogeneity

CSS434 DSM 7

Consistency ModelsTwo processes accessing shared variables

a := a + 1;b := b + 1;

br := b;ar := a;if(ar ≥ br) then print ("OK");

Process 1 Process 2At the beginning a = b = 0;

a == 1b == 1

Condition satisfied

a == 1b == 0

Condition satisfied

b == 1a == 0

This may happen if new contents are transmittedthrough a different route.

DSM needs a consistency model.

CSS434 DSM 8

Consistency ModelsStrict Consistency

Wi(x, a): Processor i writes a on variable x, (i.e., x = a;). bRi(x): Processor i reads b from variable x. (i.e., y = x; && y == b;). Any read on x must return the value of the most recent write on x.

Strict Consistency Not Strict Consistency

P1 P2 P3 P1 P2 P3

W2(x, a)

aR1(x)

aR3(x)

W2(x, a)

nilR1(x)

aR3(x)aR1(x)

aR1(x)

CSS434 DSM 9

Consistency ModelsLinearizability and Sequential Consistency

Linearlizability: Operations of each individual process appear to all processes in the same order as they happen.

Sequential Consistency: Operations of each individual process appear in the same order to all processes.

Linearlizability Sequential ConsistencyP1 P2 P3

W2(x, a)

aR1(x)

bR1(x)

P4

aR4(x)

W3(x, b)

bR4(x)

P1 P2 P3

W2(x, a)

bR1(x)

aR1(x)

P4

bR4(x)

W3(x, b)

aR4(x)

Nil <-R1(x)

CSS434 DSM 10

Consistency ModelsFIFO and Processor Consistency

FIFO Consistency: writes by a single process are visible to all other processes in the order in which they were issued.

Processor Consistency: FIFO Consistency + all write to the same memory location must be visible in the same order.

FIFO Consistency Processor Consistency

P1 P2 P3

W2(x, b)

aR1(x)0R1(x)

P1 P2 P3W2(x, a)

W3(x, 1)

W3(x, 0)W2(x, b)W2(x, a)

W3(y, 1)

W3(y, 0)

P4

P4

1R1(x)

bR1(x)

aR1(x)0R1(x)bR1(x)

1R1(z)

W2(y, a) W3(z, a)

aR1(y)

1R1(x)

1R1(z) aR1(y)

W2(y, a)W3(z, 1)

aR1(x)0R1(y)1R1(y)

bR1(x)

aR1(y)1R1(z)

aR1(x)0R1(y)1R1(y)bR1(x)

aR1(y)

1R1(z)

CSS434 DSM 11

Consistency ModelsCausal Consistency

Causally related write must be visible to all processes in the same order. Concurrent writes may be propagated in a different order.Causal Consistency Not Causal Consistency

P1 P2 P3

bR4(x)cR1(x)

P4P1 P2 P3 P4

W2(x, a)

aR3(x)

W3(x, b)

bR1(x) cR4(x)

W2(x, c)

aR4(x)aR3(x)

aR1(x)

W2(x, a)

aR3(x)

W3(x, b)

bR1(x)

bR4(x)

aR4(x)

CSS434 DSM 12

Consistency ModelsWeak Consistency

Accesses to synchronization variables must obey sequential consistency.

All previous writes must be completed before an access to a synchronization variable.

All previous accesses to synchronization variables must be completed before access to non-synchronization variable.Weak Consistency Not Weak Consistency

P1 P2 P3 P1 P2 P3W2(x, a)W2(x, b)

W2(y, c)

S2S1

S3

bR4(x)cR4(y)

cR4(y)bR4(x)

W2(x, a)

W2(x, b)W2(y, c)

S2

S1

S3

aR4(x)cR4(y)

bR4(x)cR4(y)

aR4(x)NilR4(y)

bR4(x)

CSS434 DSM 13

Consistency ModelsRelease Consistency

Access to acquire and release variables obey processor consistency.

Previous acquires requested by a process must be completed before the process performs a data access.

All previous data accesses performed by a process must be completed before the process performs a release.

P1 P2 P3

aR3(x)

Acq1(L)W1(x, a)

W1(x, b)Rel1(L)

Acq2(L)

Rel2(L)

bR2(x)

bR2(x)

CSS434 DSM 14

Process 1: acquireLock(); // enter critical sectiona := a + 1;b := b + 1;releaseLock(); // leave critical section

Process 2: acquireLock(); // enter critical sectionprint ("The values of a and b are: ", a, b);releaseLock(); // leave critical section

Consistency ModelsRelease Consistency (Example)

CSS434 DSM 15

Implementing Sequential Consistency

Replicated and Migrating Data Blocks

memory

cache

x

y

Processor

memory

cache

m

n

Processor

Node 1

memory

cache

a

b

Processor

Node 2 Node 3

mbx xDuplicate

Then what if Node 2 updates x?

CSS434 DSM 16

Implementing Sequential ConsistencyWrite Invalidation

a copy ofblock

blocka copy of

block

Client wants to write:

1. Request block

new copy

2. Replicate block

new copy

3. Invalidate block3. Invalidate block

CSS434 DSM 17

Implementing Sequential Consistency

Write Update

a copy ofblock

blocka copy of

block

Client wants to write:

1. Request block

new copy

2. Replicate block

new copy

3. Update block3. Update block

new copy new copy new copy

CSS434 DSM 18

Implementing Sequential Consistency

Read/Write Request

Unused

Writable

Read only Nil

Read-owned

Read(Read a copy from the onwer)

Read(Read from memory and get an ownership)

Write(invalidate others if they have a copy

and get an ownership)

Write(invalidate others if they have a copy)

Write(invalidate others if they have a copy

and get an ownership)

Write invalidate

Write invalidate

Write invalidate

ReplacementReplacement

Replacement

Replacement

CSS434 DSM 19

Implementing Sequential Consistency

Locating Data –Fixed Distributed-Server Algorithms

Address

Owner

0 P0

1 P0

2 P2

Addr0writable

Addr1read owned

Addr5writable

Addr3read owned

Addr7writable

Addr2read owned

Addr6writable

Addr8read owned

Address

Owner

6 P2

7 P1

8 P2

Address

Owner

3 P1

4 P2

5 P0

Addr4read owned

Processor 0 Processor 1 Processor 2

Read addr2

Addr2read only

Location search

Read request

Block replication

CSS434 DSM 20

Implementing Sequential Consistency

Locating Data – Dynamic Distributed-Server Algorithms

Address

Probable

0 P0

1 P0

2 P2

Addr0writable

Addr1read owned

Addr5writable

Addr3read owned

Addr7writable

Addr2read owned

Addr8read owned

Address

Probable

2 P2

7 P1

8 P2

Address

Probable

3 P1

4 P2

5 P0

Addr4read owned

Processor 0 Processor 1 Processor 2

Read addr2

Addr2read owned

Addr2read only

Location search

Read request

Block replication

p1

p1

Breaking the chain of nodes:

When the node receives an invalidation

When the node relinquishes ownership

When the node forwards a fault request

The node points to a new owner

CSS434 DSM 21

Replacement Strategy Which block to replace

Non-usage based (e.g. FIFO) Usage based (e.g. LRU) Mixed of those (e.g. Ivy )

Unused/Nil: replaced with the highest priority Read-only: the second priority Read-owned: the third priority Writable: the lowest priority and LRU used.

Where to place a replaced block Invalidating a block if other nodes have a copy. Using secondary store Using the memory space of other nodes

CSS434 DSM 22

Thrashing Thrashing:

Two or more processes try to write the same shared block. An owner keeps writing its block shared by two or more

reader processes. The larger a block, the more chances of false sharing that

causes thrashing. Solutions:

Allow a process to prevent a block from accessed from the others, using a lock.

Allow a process to hold a block for a certain amount of time. Apply a different coherence algorithm to each block.

What do those solutions require users to do? Are there any perfect solutions?

CSS434 DSM 23

Paper Review by Students IVY Dash Munin Linda/Jini/JavaSpace Discussions:

Classify which system is based on sequential consistency, release consistency, and lazy release consistency.

Classify the shared data granularity of these systems: cache-line based, page-based, and object-based.

Classify the implementation of these systems: hardware implementation, OS implementation, and User-level implementation.

CSS434 DSM 24

Non-Turn-In Exercises1. Is the memory underlying the following

execution of two processes sequentially consistent (assuming that, initially, all variables are set to zero)? P1:R(x)1; R(x)2; W(y)1

1. P2: W(x)1; R(y)1; W(x)22. Show that the following history is not

causally consistent. 1. P1: W(a)0; W(a)12. P2: R(a)1; W(b)23. P3: R(b)2; R(a)0

3. Explain the relationship between false sharing and data granularity in DSM.

CSS434 DSM 25

Non-Turn-In ExercisesProcessor 1

ownership table

addr

0

1

2

owner

P0

P0

P3

shared

P3

addr0 addr

1

event

Processor 2ownership table

addr

3

4

5

owner

P2

P3

P0

shared

addr3 addr

7

addr8

Processor 3ownership table

addr

6

7

8

owner

P3

P2

P2

shared

addr2 addr

4

addr6

data items data items data items

copyaddr1

4

4. There is a DSM system that is based on the write-invalidation protocol, uses a fixed distributed-server algorithm for locating a given data item, and consists of three processors such as 1, 2, and 3. Each processor has the following data items and an ownership/sharing-processor table.

CSS434 DSM 26

Non-Turn-In ExercisesGiven the following sequence of memory accesses, draw additional arrows and circles in the above figure as instructed. To

distinguish which arrow corresponds to which operation, add the operation number 1 – 8 to each arrow. Also, update the corresponding ownership table entries.

(1) Memory access #1: Processor 2 reads data from address 2.Add arrows in the above figure to indicate operations required for the memory access #1.

1. Send a query to search for the address 22. Send a request to read from the address 23. Read data from the address 2 to Processor 2

Update the corresponding ownership table entry. (Just add P2 in the “share” field.)Draw a circle to indicate that a copy of address 2 was created on Processor 2. (2) Memory access #2: Processor 1 reads data from address 2.Add arrows in the above figure to indicate operations required for the memory access #2.

4. Send a query to search for the address 25. Send a request to read from the address 26. Read data from the address 1 to Processor 2

Update the corresponding ownership table entry. (Just add P1 in the “share” field.)Draw a circle to indicate that a copy of address 2 was created on Processor 1.

(3) Memory access #3: Processor 2 writes data to address 2.Add arrows in the above figure to indicate operations required for the memory access #3.

7. Send a request to update the ownership information on the address 28. Send a write invalidation to all non-owner processors sharing the address 2

Update the corresponding ownership table entry. (Make Processor 2 a new owner of address 2 and cross out all other processor Ids in the entry.)

Cross out all circles to indicate that old copies of address 2 were all invalidated.