Project Mimir

Project Mimir

• A Distributed Filesystem

• Uses Rateless Erasure Codes for Reliability

• Uses Pastry’s Multicast System Scribe for Resource discovery and Utilization

Erasure Codes

• an erasure code transforms a message of n blocks into a message with more than n blocks, such that the original message can be recovered from a subset of those blocks. The fraction of the blocks required is called the rate, denoted r.

Optimal Erasure Codes

Rateless Erasure Codes

• Also called Fountain Codes

• Are rateless because they can produce an infinite stream of encoded data.

• Most rateless codes are sub-optimal, a file of n blocks can be decoded from any m encoded blocks where m ≥ (1+ε)n.

Luby Tranform• Low Density• First rateless erasure code discovered• Encodes blocks by randomly selecting a

degree d (1≤d≤n) and then uses the XOR operation on d random un-encoded blocks.

• Decodes blocks by the principle that – A XOR B = C– A XOR C = B– B XOR C = A

Encoded Block IdentificationThe decoder must be able to identify an encoded

block’s degree and which blocks were used to encode it.

• Pass the seed used in the random number generator to regenerate an identical encoding.– Additional decoder overhead– All encoders and decoders must use the same

random number generators• Attach binary headers to encoded blocks where

each bit represents whether a specific block was used in the encoding.– Additional network overhead– Header bit length equals n

Common Uses of LT Codes

• One way communication protocols over noisy channels– File streaming (IPTV & other media)– Long distance communication– Satellite communications– Mobile phone– High latency network

LT Codes in Mimir

• Encoded blocks striped evenly across a large network of computers.

• Generate xn encoded data and guaranteed successful decoding when 1-(2/x) of all network nodes fail.

• Distributed disk space equals real disk space /x– 50 computers *100GB each = 5TB– X=4; distributed disk space =1.25TB– 100% reliability while at least 25 computers are online

Challenges

• Cannot encode new data unless file is reconstructed first, high churn network requires a lot of computation

• Decoding still probabilistic, although probability is extremely high: greater than 99.99999 at failure limit

Modifications to LT Code

• Modified the LT Code to guarantee each block is encoded an equal number of times (evenly saturated). RobuStore also does this.

• Evenly distribute according to block degree• Modified distribution

– spiking – n unique blocks with degree 1– offset distribution

Application level any/multicast

• Issues with Network level multicast• Uses for multicast

– Publish Subscribe Architecture– Resource Advertisement/ Discovery– Mass Content Distribution

Issues with network level multicast

• Difficult to set up• Does not handle large numbers of

multicast and anycast groups• Does not handle very dynamic networks• Often will not work over the Internet

Content Based Publish Subscribe

• Allows for expandable network architectures

• Allows for conditional matching and event notification

• Allows for fault tolerant networks• Needs Distributed Multidimensional

Matching algorithm to match publish subscribe problem to one of multidimensional indexing

Distributed Multidimensional Matching

• Requires– each attribute has a known domain– a known finest granularity– a known global order of the attributes

• The mapping– a d dimensional space S where d = num attributes

– every attribute ai mappes to a dimesion di in S

– S is managed by bst that is a recursive subdivision of S into regions through (d-1) – dimensional hyperplanes

– each hyperplane divides a region in half.– each region has a corresponding node n(r) in search tree

DMM Continued

• Each region is addressed by a bit string called a z-code and is assocated with one node in the tree

• a subscription s is stored at all leaf nodes n(ri) in the search tree such that ri intersects s.

• the information of each node in the tree is stored at the peer p(r) in the DHT .

• Subscriptions are sent to the root peer and flow down to appropriate leaf nodes.

Resource Discovery (Topic Based Matching)

• Manage dynamic distributed resources• Nodes join groups when they have a

desired resource leave when they no longer are avalible

• other nodes can request nearby resources by anycasting/multicasting messages to the appropriate group.

Implementation

• each group has a key called groupId which maps to a DHT's ID space.

• Create a group– send a message to that group id– nearest node becomes root of spanning group tree– root then adds the requesting node

• Join– send a message to the root– if an intermediate node is part of that spanning tree

add the requesting node as a child and stop– otherwise keep forwarding the message

Continued

• Leaving the group– If a node has entries in the children table

• mark as not a member and stop

– otherwise• send a leave message to it's parent in the tree• parent then removes node from the children table

• Anycast– Implemented as a DFS of the group tree – Load balanced since different requests start

at different nodes

Multicast

• Like anycast but sends to all group members

• Uses bandwidth on O(N)• Useful to request data (values) when most

members will have some data to contribute.

Pastry/Scribe in Mimir

• Pastry's Id routing is used to provide a network independent routing scheme.

• We have three Topics– Metadata Controller Topic

• provides security and path to fileid mapping

– Storage Node Topic• provides a way to list avalible resources (file

storage)

– Client Topic• provides information on client nodes

File Storage Request

• Send Multi cast request to MDC to add a file to the system

• MDC sends back to requesting client the new files id

• Client then multi casts a store request message

• all storage nodes respond with ip/port data• Client then connects directly to each storage

node and stripes the encoded blocks over them evenly.

File Retrieval Request

• Multi cast a GET FILE <FILE Id> message to storage nodes

• Storage nodes look up what data they have for that file and send it back to the requesting Id.

• Client then rebuilds the file as data is being received

• If the file can not be rebuilt print some error message.

Advantages of Pastry for Mimir

• P2P network provides a reliable way to handle a dynamic set of nodes and keep communication open with no central communication point

• Multicast helps us quickly attempt to get and save files

Project Mimir

Documents

Transcript of Project Mimir