Project Mimir

25
Project Mimir • A Distributed Filesystem • Uses Rateless Erasure Codes for Reliability • Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization

description

Project Mimir. A Distributed Filesystem Uses Rateless Erasure Codes for Reliability Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization. Erasure Codes. - PowerPoint PPT Presentation

Transcript of Project Mimir

Page 1: Project Mimir

Project Mimir

• A Distributed Filesystem

• Uses Rateless Erasure Codes for Reliability

• Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization

Page 2: Project Mimir

Erasure Codes

• an erasure code transforms a message of n blocks into a message with more than n blocks, such that the original message can be recovered from a subset of those blocks. The fraction of the blocks required is called the rate, denoted r.

Page 3: Project Mimir

Optimal Erasure Codes

Page 4: Project Mimir

Rateless Erasure Codes

• Also called Fountain Codes

• Are rateless because they can produce an infinite stream of encoded data.

• Most rateless codes are sub-optimal, a file of n blocks can be decoded from any m encoded blocks where m ≥ (1+ε)n.

Page 5: Project Mimir

Luby Tranform• Low Density• First rateless erasure code discovered• Encodes blocks by randomly selecting a

degree d (1≤d≤n) and then uses the XOR operation on d random un-encoded blocks.

• Decodes blocks by the principle that – A XOR B = C– A XOR C = B– B XOR C = A

Page 6: Project Mimir
Page 7: Project Mimir

Encoded Block IdentificationThe decoder must be able to identify an encoded

block’s degree and which blocks were used to encode it.

• Pass the seed used in the random number generator to regenerate an identical encoding.– Additional decoder overhead– All encoders and decoders must use the same

random number generators• Attach binary headers to encoded blocks where

each bit represents whether a specific block was used in the encoding.– Additional network overhead– Header bit length equals n

Page 8: Project Mimir

Common Uses of LT Codes

• One way communication protocols over noisy channels– File streaming (IPTV & other media)– Long distance communication– Satellite communications– Mobile phone– High latency network

Page 9: Project Mimir

LT Codes in Mimir

• Encoded blocks striped evenly across a large network of computers.

• Generate xn encoded data and guaranteed successful decoding when 1-(2/x) of all network nodes fail.

• Distributed disk space equals real disk space /x– 50 computers *100GB each = 5TB– X=4; distributed disk space =1.25TB– 100% reliability while at least 25 computers are online

Page 10: Project Mimir
Page 11: Project Mimir

Challenges

• Cannot encode new data unless file is reconstructed first, high churn network requires a lot of computation

• Decoding still probabilistic, although probability is extremely high: greater than 99.99999 at failure limit

Page 12: Project Mimir

Modifications to LT Code

• Modified the LT Code to guarantee each block is encoded an equal number of times (evenly saturated). RobuStore also does this.

• Evenly distribute according to block degree• Modified distribution

– spiking – n unique blocks with degree 1– offset distribution

Page 13: Project Mimir

Application level any/multicast

• Issues with Network level multicast• Uses for multicast

– Publish Subscribe Architecture– Resource Advertisement/ Discovery– Mass Content Distribution

Page 14: Project Mimir

Issues with network level multicast

• Difficult to set up• Does not handle large numbers of

multicast and anycast groups• Does not handle very dynamic networks• Often will not work over the Internet

Page 15: Project Mimir

Content Based Publish Subscribe

• Allows for expandable network architectures

• Allows for conditional matching and event notification

• Allows for fault tolerant networks• Needs Distributed Multidimensional

Matching algorithm to match publish subscribe problem to one of multidimensional indexing

Page 16: Project Mimir

Distributed Multidimensional Matching

• Requires– each attribute has a known domain– a known finest granularity– a known global order of the attributes

• The mapping– a d dimensional space S where d = num attributes

– every attribute ai mappes to a dimesion di in S

– S is managed by bst that is a recursive subdivision of S into regions through (d-1) – dimensional hyperplanes

– each hyperplane divides a region in half.– each region has a corresponding node n(r) in search tree

Page 17: Project Mimir

DMM Continued

• Each region is addressed by a bit string called a z-code and is assocated with one node in the tree

• a subscription s is stored at all leaf nodes n(ri) in the search tree such that ri intersects s.

• the information of each node in the tree is stored at the peer p(r) in the DHT .

• Subscriptions are sent to the root peer and flow down to appropriate leaf nodes.

Page 18: Project Mimir

Resource Discovery (Topic Based Matching)

• Manage dynamic distributed resources• Nodes join groups when they have a

desired resource leave when they no longer are avalible

• other nodes can request nearby resources by anycasting/multicasting messages to the appropriate group.

Page 19: Project Mimir

Implementation

• each group has a key called groupId which maps to a DHT's ID space.

• Create a group– send a message to that group id– nearest node becomes root of spanning group tree– root then adds the requesting node

• Join– send a message to the root– if an intermediate node is part of that spanning tree

add the requesting node as a child and stop– otherwise keep forwarding the message

Page 20: Project Mimir

Continued

• Leaving the group– If a node has entries in the children table

• mark as not a member and stop

– otherwise• send a leave message to it's parent in the tree• parent then removes node from the children table

• Anycast– Implemented as a DFS of the group tree – Load balanced since different requests start

at different nodes

Page 21: Project Mimir

Multicast

• Like anycast but sends to all group members

• Uses bandwidth on O(N)• Useful to request data (values) when most

members will have some data to contribute.

Page 22: Project Mimir

Pastry/Scribe in Mimir

• Pastry's Id routing is used to provide a network independent routing scheme.

• We have three Topics– Metadata Controller Topic

• provides security and path to fileid mapping

– Storage Node Topic• provides a way to list avalible resources (file

storage)

– Client Topic• provides information on client nodes

Page 23: Project Mimir

File Storage Request

• Send Multi cast request to MDC to add a file to the system

• MDC sends back to requesting client the new files id

• Client then multi casts a store request message

• all storage nodes respond with ip/port data• Client then connects directly to each storage

node and stripes the encoded blocks over them evenly.

Page 24: Project Mimir

File Retrieval Request

• Multi cast a GET FILE <FILE Id> message to storage nodes

• Storage nodes look up what data they have for that file and send it back to the requesting Id.

• Client then rebuilds the file as data is being received

• If the file can not be rebuilt print some error message.

Page 25: Project Mimir

Advantages of Pastry for Mimir

• P2P network provides a reliable way to handle a dynamic set of nodes and keep communication open with no central communication point

• Multicast helps us quickly attempt to get and save files