Fail-safe Communication Layer for DisplayWall
description
Transcript of Fail-safe Communication Layer for DisplayWall
![Page 1: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/1.jpg)
Fail-safe Communication Layerfor DisplayWall
Yuqun Chen
![Page 2: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/2.jpg)
DisplayWall Software Architecture
rendernode
rendernode
rendernode
masternode
Logical Network1
11
1
2 223 3
1
3
3
Command Broadcast
Synchronization
Data Exchange
![Page 3: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/3.jpg)
Motivation
• Complex communication patterns– programming the DisplayWall is difficult
• BitBlt operation in Virtual Display Driver
• Nodes and network links do fail– larger system is more likely to fail– OS may not be stable under high-load– applications have bugs
![Page 4: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/4.jpg)
Design Goal [1]
• Eases writing distributed applications on DisplayWall– supports some form of group concept– multicast or broadcast– no need to manage pair-wise connections
![Page 5: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/5.jpg)
Design Goals[2]
• An API for designing fail-safe DisplayWall apps– tolerate independent failures and recovery of
render nodes– failures at the application master nodes are
considered catastrophic– certain bugs cause all render nodes to fail
• may or may not be able to deal with this
![Page 6: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/6.jpg)
What’s different with others?
• API-wise– mostly broadcast– some pair-wise exchange
• Fault-tolerance wise– real-time characteristic
• cannot wait for too long
– OK to lose certain messages• dropping a few frames is OK
![Page 7: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/7.jpg)
Some Requirements
• Simple abstraction– users shouldn’t deal with pair-wise connections
• Realizable on a variety of platforms– with and without programmable NI
• Support storage and retrieval of application-dependent states (soft states)
• Synchronized Clocks and Barriers
![Page 8: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/8.jpg)
Communication Patterns• Command/Data Delivery
– from master to some render nodes– broadcast in nature
• Data exchange– among render nodes, e.g., bitblt and v-tiling– pair-wise in nature
• Synchronization: clock and barriers– low-latency
![Page 9: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/9.jpg)
Outline
• Communication patterns
• Soft States
• API issues
![Page 10: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/10.jpg)
Command Broadcast
• Used by all applications– VDD, OpenGL, ImageViewer
• Issues:– efficiency– dynamic membership:
• live/dead nodes, overlapped windows
– delivery semantics• best-effort, guaranteed, or sloppy
![Page 11: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/11.jpg)
Guaranteed or Best-effort?
• Guaranteed delivery implies– re-configurable (logical) topology for delivering
data– loss-less delivery to all nodes
• an intermediate node may fail right after delivery– the up-stream has to keep all the data for retransmission
• Best-effort– as far as current topology allows– or limited flexibility
![Page 12: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/12.jpg)
Transactions?
• Each broadcast is treated as a transaction– keeps the data until the transaction commits
• Transactions are asynchronous– one doesn’t wait till the previous commits– but, they are applied in order
• Applied transactions mark the state at each node
![Page 13: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/13.jpg)
Fail-safe Broadcast
1 2 3 4
1 2 3 4
1 2 3 4
![Page 14: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/14.jpg)
Fail-safe Broadcast
recv(msg)
send msg to children c1, c2, …, cn
if child c failed thenrecompute c’s subtree without csend to the new child cc
![Page 15: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/15.jpg)
Questions
• How to detect failure in a timely fashion?– global solution
• the leaves send the acks to a master
• the master forces a global reconfiguration after a timeout
– a local solution• period positive ACK to the parent
![Page 16: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/16.jpg)
Data Exchange
• Pair-wise sends and recvs
• High-level issues:– avoid the recv to get stuck by a failing sender
• timeout or period “I am alive” ack
• Implementation issues:– neighborhood communication?
• probably sufficient for most apps except for load balancing
![Page 17: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/17.jpg)
Synchronization
• Barrier– a special form of broadcast– mostly used for global frame-buffer swap
• Clock synchronization– can be used to reduce the frequency of barriers– e.g., MPEG playback according to local clock
• what if it misses the deadline?
![Page 18: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/18.jpg)
Unification
• Transaction-based messages implement broadcast– implements some form of global ordering
• Message passing for pair-wise data exchange– key is to detect the failures quickly
![Page 19: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/19.jpg)
Example 1: VDD BitBltBitBlt:
calculate data to sendcalculate data to recv
read frame buffer
send data to some nodesif failure then take note
recv data from some nodesif failure then use default image
![Page 20: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/20.jpg)
Failure Recovery
• What happens when a failed render node comes back up?– put itself into the broadcast tree– re-establish peer-to-peer message connection
• hopefully all hidden
– it has to bring its states up to date• highly application dependent
![Page 21: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/21.jpg)
Soft States Example: OpenGL
• Display List– each list consists a series of GL commands
• must be re-executed to make the list meaningful
• Textures– may be bound to a texture name
![Page 22: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/22.jpg)
Soft States API• Tagged Safe Memory
– a chunk of memory replicated on all nodes– tagged and ordered by an ID– associate a recovery handle/function to it
• Operations: create, insert, and delete
• Upon recovery:– retrieve all “live” chunks and apply the handles
in order
![Page 23: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/23.jpg)
Example 2: VDD recovery
• Re-establish connections between nodes
• Restore States:– cached bitmaps from other running nodes– fonts and brushes from other nodes
• Or, force a re-draw from the master nodes
![Page 24: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/24.jpg)
Messages?
• Transactions can be implemented on top of messages– add a transaction ID for each message
• People are familiar with message passing– as opposed to remote memory access
• you have to manage memory, pretty message
• What about copy avoidance?
![Page 25: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/25.jpg)
Communication API
• Message Passing Interface– send(node, type,data), recv(node, type, data)– only need to specify a remote node id– connection is hidden from the API– copy can be avoided by returning a buffer
pointer instead of filling the user buffer– very close to sockets API but more flexible
![Page 26: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/26.jpg)
Copy Avoiding Message Passing
• Trivial– just return a pointer to the buffer
• Remote memory semantics not necessary– very rare to update remote data structure– memory copy isn’t that bad ( > 200 MB/sec)
• What about remote bitblt?– the only missing part is peer-to-peer
• which we can’t do any way
![Page 27: Fail-safe Communication Layer for DisplayWall](https://reader035.fdocuments.us/reader035/viewer/2022062805/56814d57550346895dba9036/html5/thumbnails/27.jpg)
Copy Avoiding Messages
CoreLogic
globalbuffer
NIC
Graphicsmsg
hdr
new msg
recv(msg)