© 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads...
-
Upload
alessandro-wildman -
Category
Documents
-
view
213 -
download
0
Transcript of © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads...
![Page 1: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/1.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Storage codes: Managing Big Data with Small Overheads
Presented by
Anwitaman Datta & Frédérique E. Oggier Nanyang Technological University, Singapore
Tutorial at NetCod 2013, Calgary, Canada.
![Page 2: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/2.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Big Data Storage: Disclaimer
A note from the trenches: "You know you have a large storage system when you get paged at 1 AM because you only have a few petabytes of storage left." – from Andrew Fikes’ (Principal Engineer, Google) faculty summit talk ` Storage Architecture and Challenges `, 2010.
2
![Page 3: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/3.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Big Data Storage: Disclaimer
A note from the trenches: "You know you have a large storage system when you get paged at 1 AM because you only have a few petabytes of storage left." – from Andrew Fikes’ (Principal Engineer, Google) faculty summit talk ` Storage Architecture and Challenges `, 2010.
We neverget such calls!!
3
![Page 4: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/4.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Big data
• June 2011 EMC2 study– world’s data is more than doubling
every 2 years• faster than Moore’s Law
– 1.8 zettabytes of data to be created in 2011
Big data: - big problem? - big opportunity?
* http://www.emc.com/about/news/press/2011/20110628-01.htm
Zetta: 1021
Zettabyte: If you stored all of this data on DVDs, the stack would reach from the Earth to the moon and back.
4
![Page 5: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/5.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
The data deluge: Some numbers
• Facebook “currently” (in 2010) stores over 260 billion images, which translates to over 20 petabytes of data. Users upload one billion new photos (60 terabytes) each week and Facebook serves over one million images per second at peak. [quoted from a paper on “Haystack” from Facebook]
• On “Saturday”, photo number four billion was uploaded to photo sharing site Flickr. This comes just five and a half months after the 3 billionth and nearly 18 months after photo number two billion. – Mashable (13th October 2009)
[http://mashable.com/2009/10/12/flickr-4-billion/]
5
![Page 6: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/6.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Scale how?
* Definitions from Wikipedia6
![Page 7: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/7.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Scale how?
Scale up
To scale vertically (or scale up) means to add resources to a single node in a system*
* Definitions from Wikipedia7
![Page 8: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/8.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Scale how?
Scale up Scale out
To scale horizontally (or scale out) means to add more nodes to a system, such as adding a new
computer to a distributed software application*
To scale vertically (or scale up) means to add resources to a single node in a system*
* Definitions from Wikipedia8
not distributing is not an option!
![Page 9: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/9.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Failure (of parts) is Inevitable
9
![Page 10: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/10.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Failure (of parts) is Inevitable • But, failure of the system is not an option either!
– Failure is the pillar of rivals’ success …
10
![Page 11: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/11.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Deal with it
• Data from Los Alamos National Laboratory (DSN 2006), gathered over 9 years, 4750 machines, 24101 CPUs.
• Distribution of failures:– Hardware 60%– Software 20%– Network/Environment/Humans 5%
• Failures occurred between once a day to once a month.
11
![Page 12: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/12.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Failure happens without fail, so …• But, failure of the system is not an option either!
– Failure is the pillar of rivals’ success …
• Solution: Redundancy
12
![Page 13: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/13.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Many Levels of Redundancy
• Physical• Virtual resource• Availability zone• Region• Cloud
From: http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html
13
![Page 14: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/14.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Redundancy Based Fault Tolerance
• Replicate data– e.g., 3 or more copies– In nodes on different racks
• Can deal with switch failures
• Power back-up using battery between racks (Google)
14
![Page 15: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/15.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
But At What Cost?
• Failure is not an option, but …– … are the overheads acceptable?
15
![Page 16: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/16.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
• Erasure codes– Much lower storage overhead– High level of fault-tolerance
• In contrast to replication or RAID based systems
• Has the potential to significantly improve the “bottomline” – e.g., Both Google’s new DFS Collossus, as well as Microsoft’s
Azure now use ECs
Reducing the Overheads of Redundancy
16
![Page 17: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/17.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Erasure Codes (ECs)
17
![Page 18: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/18.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Erasure Codes (ECs)
• An (n,k) erasure code = a map that takes as input k blocks and outputs n blocks, thus introducing n-k blocks of redundancy.
18
![Page 19: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/19.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Erasure Codes (ECs)
• An (n,k) erasure code = a map that takes as input k blocks and outputs n blocks, thus introducing n-k blocks of redundancy.
• 3 way replication is a (3,1) erasure code!
19
![Page 20: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/20.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Erasure Codes (ECs)
• An (n,k) erasure code = a map that takes as input k blocks and outputs n blocks, thus introducing n-k blocks of redundancy.
• 3 way replication is a (3,1) erasure code!
k=1 block
20
![Page 21: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/21.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Erasure Codes (ECs)
• An (n,k) erasure code = a map that takes as input k blocks and outputs n blocks, thus introducing n-k blocks of redundancy.
• 3 way replication is a (3,1) erasure code!
Encoding
k=1 block n=3 encoded blocks
21
![Page 22: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/22.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Erasure Codes (ECs)
• An (n,k) erasure code = a map that takes as input k blocks and outputs n blocks, thus introducing n-k blocks of redundancy.
• 3 way replication is a (3,1) erasure code!
• An erasure code such that the k original blocks can be recreated out of any k encoded blocks is called MDS (maximum distance separable).
Encoding
k=1 block n=3 encoded blocks
22
![Page 23: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/23.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Decoding…
Receive any k’ (≥ k) blocks
…
B1
B2
Bn
O1
O2
Ok
Lost blocks
Erasure Codes (ECs)
Encoding
n encoded blocksOriginalk blocks
k blocks
…
Dat
a =
mes
sage
Rec
onst
ruct
Dat
a
O1
O2
Ok
• Originally designed for communication– EC(n,k)
23
![Page 24: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/24.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Erasure Codes for Networked StorageD
ata
= O
bje
ct
Encoding
k blocks
…O1
O2
Ok
B2
B1
Bn
n encoded blocks(stored in storage devices in a network)
…… Lost blocks
Retrieve any k’ (≥ k) blocks
Originalk blocks
…
Rec
onst
ruct
Dat
a
O1
O2
Ok
DecodingBl
24
![Page 25: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/25.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore25
HDFS-RAID
• Distributed RAID File system (DRFS) client– provides application access to the files in the DRFS– transparently recovers any corrupt or missing blocks encountered when
reading a file (degraded read)• Does not carry out repairs
• RaidNode, a daemon that creates and maintains parity files for all data files stored in the DRFS
• BlockFixer, which periodically recomputes blocks that have been lost or corrupted– RaidShell allows on demand repair triggered by administrator
• Two kinds of erasure codes implemented– XOR code and Reed-Solomon code (typically 10+4 w/ 1.4x overhead)
From http://wiki.apache.org/hadoop/HDFS-RAID
![Page 26: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/26.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
B2
B1
Bn
……
Lost blocks
Retrieve any k’ (≥ k) blocks
Originalk blocks
…
O1
O2
Ok
Decoding EncodingBl
Recreate lost blocks
Re-insert
Reinsert in (new) storage devices, so that there is (again) n encoded blocks
n encoded blocks
Replenishing Lost Redundancy for ECs • Repair needed for long term resilience.
26
![Page 27: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/27.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
B2
B1
Bn
……
Lost blocks
Retrieve any k’ (≥ k) blocks
Originalk blocks
…
O1
O2
Ok
Decoding EncodingBl
Recreate lost blocks
Re-insert
Reinsert in (new) storage devices, so that there is (again) n encoded blocks
n encoded blocks
Replenishing Lost Redundancy for ECs • Repair needed for long term resilience.
• Repairs are expensive! 27
![Page 28: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/28.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Tailored-Made Codes for Storage
28
![Page 29: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/29.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Tailored-Made Codes for StorageDesired code properties include:• Low storage overhead• Good fault tolerance
29
Traditional MDS erasure codes achieve these.
![Page 30: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/30.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Tailored-Made Codes for StorageDesired code properties include:• Low storage overhead• Good fault tolerance • Better repairability
30
Traditional MDS erasure codes achieve these.
![Page 31: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/31.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Tailored-Made Codes for StorageDesired code properties include:• Low storage overhead• Good fault tolerance • Better repairability
31
Traditional MDS erasure codes achieve these.
Smaller repair fan-in Reduced I/O for repairs Possibility of multiple simultaneous repairs Fast repairs Efficient B/W usage …
![Page 32: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/32.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Tailored-Made Codes for StorageDesired code properties include:• Low storage overhead• Good fault tolerance • Better repairability• Better …
32
Traditional MDS erasure codes achieve these.
Smaller repair fan-in Reduced I/O for repairs Possibility of multiple simultaneous repairs Fast repairs Efficient B/W usage …
![Page 33: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/33.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Tailored-Made Codes for StorageDesired code properties include:• Low storage overhead• Good fault tolerance • Better repairability• Better …
33
Traditional MDS erasure codes achieve these.
Smaller repair fan-in Reduced I/O for repairs Possibility of multiple simultaneous repairs Fast repairs Efficient B/W usage …
Better data-insertion Better migration to archival …
![Page 34: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/34.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Pyramid (Local Reconstruction) Codes
34
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems, C. Huang et al. @ NCA 2007
Erasure Coding in Windows Azure Storage, C. Huang et al. @ USENIX ATC 2012
![Page 35: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/35.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Pyramid (Local Reconstruction) Codes
35
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems, C. Huang et al. @ NCA 2007
Erasure Coding in Windows Azure Storage, C. Huang et al. @ USENIX ATC 2012
– Good for degraded reads (data locality)
![Page 36: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/36.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Pyramid (Local Reconstruction) Codes
36
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems, C. Huang et al. @ NCA 2007
Erasure Coding in Windows Azure Storage, C. Huang et al. @ USENIX ATC 2012
– Good for degraded reads (data locality)– Not all repairs are cheap (only partial parity locality)
![Page 37: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/37.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Regenerating Codes• Network information flow based arguments to determine
“optimal” trade-off of storage/repair-bandwidth
37
![Page 38: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/38.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Locally Repairable Codes• Codes satisfying: low repair fan-in, for any failure• The name is reminiscent of “locally decodable codes”
38
![Page 39: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/39.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Self-repairing Codes
39
![Page 40: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/40.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Self-repairing Codes• Usual disclaimer: “To the best of our knowledge”
– First instances of locally repairable codes• Self-repairing Homomorphic Codes for Distributed Storage Systems
– Infocom 2011
• Self-repairing Codes for Distributed Storage Systems – A Projective Geometric Construction– ITW 2011
– Since then, there have been many other instances from other researchers/groups
40
![Page 41: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/41.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Self-repairing Codes• Usual disclaimer: “To the best of our knowledge”
– First instances of locally repairable codes• Self-repairing Homomorphic Codes for Distributed Storage Systems
– Infocom 2011
• Self-repairing Codes for Distributed Storage Systems – A Projective Geometric Construction– ITW 2011
– Since then, there have been many other instances from other researchers/groups
• Note– k encoded blocks are enough to recreate the object
• Caveat: not any arbitrary k (i.e., SRCs are not MDS)• However, there are many such k combinations
41
![Page 42: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/42.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Self-repairing Codes: Blackbox View
B2
B1
Bn
n encoded blocks(stored in storage devices in a network)
…… Lost blocks
Retrieve some k” (< k) blocks (e.g. k”=2)
to recreate a lost block
Bl
Re-insert
Reinsert in (new) storage devices, so that there is (again) n encoded blocks
42
![Page 43: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/43.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
PSRC Example
![Page 44: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/44.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
PSRC Example
![Page 45: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/45.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
PSRC Example
(o1+o2+o4) + (o1) => o2+o4
(o3) + (o2+o3) => o2
(o1) + (o2) => o1+ o2
Repair using two nodes
Four pieces needed to regenerate two pieces
Say N1 and N3
![Page 46: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/46.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
PSRC Example
(o1+o2+o4) + (o1) => o2+o4
(o3) + (o2+o3) => o2
(o1) + (o2) => o1+ o2
Repair using two nodes
Four pieces needed to regenerate two pieces
Say N1 and N3
(o1+o2+o4) + (o4) => o1+o2
(o2) + (o4) => o2+ o4 Repair using three nodes
Three pieces needed to regenerate two pieces
Say N2, N3 and N4
46
![Page 47: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/47.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore47
Replicas
Erasure coded data
Data access
Recap
![Page 48: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/48.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore48
Replicas
Erasure coded data
Data access
fault-tolerant data access
(MSR’s Reconstruction
code)
Recap
![Page 49: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/49.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore49
Replicas
Erasure coded data
Data access
fault-tolerant data access
(MSR’s Reconstruction
code)
Recap
(partial) re-encode/repair (e.g., Self-Repairing Codes)
![Page 50: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/50.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore50
Data insertion
Replicas
Erasure coded data
pipelined
inser
tion
e.g., d
ata
for analy
tics
Data access
fault-tolerant data access
(MSR’s Reconstruction
code)
Recap
(partial) re-encode/repair (e.g., Self-Repairing Codes)
![Page 51: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/51.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore51
Data insertion
Replicas
Erasure coded data
pipelined
inser
tion
e.g., d
ata
for analy
tics
In-network coding
e.g., multimedia
Data access
fault-tolerant data access
(MSR’s Reconstruction
code)
Next
(partial) re-encode/repair (e.g., Self-Repairing Codes)
![Page 52: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/52.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore52
Inserting Redundant Data
![Page 53: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/53.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore53
• Data insertion
Inserting Redundant Data
![Page 54: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/54.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore54
• Data insertion– Replicas can be inserted in a pipelined manner
Inserting Redundant Data
![Page 55: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/55.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore55
• Data insertion– Replicas can be inserted in a pipelined manner
Inserting Redundant Data
![Page 56: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/56.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore56
• Data insertion– Replicas can be inserted in a pipelined manner
– Traditionally, erasure coded systems used a central point of processing
Inserting Redundant Data
![Page 57: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/57.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore57
• Data insertion– Replicas can be inserted in a pipelined manner
– Traditionally, erasure coded systems used a central point of processing
Inserting Redundant Data
Can the process of redundancy generation be distributed among the storage nodes?
![Page 58: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/58.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore58
In-network coding
• Ref: In-Network Redundancy Generation for Opportunistic Speedup of Backup, Future Generation Comp. Syst. 29(1), 2013
![Page 59: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/59.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore59
In-network coding
• Ref: In-Network Redundancy Generation for Opportunistic Speedup of Backup, Future Generation Comp. Syst. 2013
• Motivations– Reduce the bottleneck at a single point
• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generation
![Page 60: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/60.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore60
In-network coding
• Ref: In-Network Redundancy Generation for Opportunistic Speedup of Backup, Future Generation Comp. Syst. 2013
• Motivations– Reduce the bottleneck at a single point
• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generation
![Page 61: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/61.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore61
In-network coding
• Ref: In-Network Redundancy Generation for Opportunistic Speedup of Backup, Future Generation Comp. Syst. 2013
• Motivations– Reduce the bottleneck at a single point
• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generation
– Utilize network resources opportunistically• Data-centers: When network/nodes are not busy doing other things• P2P/F2F: When nodes are online
![Page 62: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/62.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore62
In-network coding
• Ref: In-Network Redundancy Generation for Opportunistic Speedup of Backup, Future Generation Comp. Syst. 2013
• Motivations– Reduce the bottleneck at a single point
• The “source” (or first point of processing) still needs to inject “enough information” for the network to be able to carry out the rest of the redundancy generation
– Utilize network resources opportunistically• Data-centers: When network/nodes are not busy doing other things• P2P/F2F: When nodes are online
More traffic (ephemeral resource)
but higher data insertion throughput
![Page 63: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/63.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore63
In-network coding
![Page 64: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/64.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore64
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
![Page 65: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/65.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore65
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:
![Page 66: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/66.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore66
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
![Page 67: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/67.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore67
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
![Page 68: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/68.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore68
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
• A naïve approach
![Page 69: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/69.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore69
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
• A naïve approach
r1
r2
r4
![Page 70: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/70.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore70
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
• A naïve approach
r1
r2
r4
r6
![Page 71: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/71.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore71
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
• A naïve approach
r1
r2
r4
r6
r7
![Page 72: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/72.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore72
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
• A naïve approach
r1
r2
r4
r6
r7
Need a good “schedule” to insert the redundancy!
- Avoid cycles/dependencies
![Page 73: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/73.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore73
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
• A naïve approach
r1
r2
r4
r6
r7
Need a good “schedule” to insert the redundancy!
- Avoid cycles/dependencies
Subject to “unpredictable” availability of resource!!
![Page 74: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/74.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore74
In-network coding
• Dependencies among self-repairing coded fragments can be exploited for in-network coding!
• Consider a SRC(3,7) code with the following dependencies:– r1, r2, r3=r1+r2, r4, r5=r1+r4, r6=r2+r4, r7=r1+r6
• Note: r6=r1+r7, etc. also … [details in Infocom 2011 paper]
• A naïve approach
r1
r2
r4
r6
r7
Need a good “schedule” to insert the redundancy!
- Avoid cycles/dependencies
Subject to “unpredictable” availability of resource!!
Turns out to be O(n!) w/ Oracle
![Page 75: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/75.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore75
In-network coding
![Page 76: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/76.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore76
In-network coding
• Heuristics– Several other policies (such as max data) were also tried
![Page 77: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/77.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore77
In-network coding
![Page 78: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/78.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore78
In-network coding
![Page 79: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/79.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore79
In-network coding
• RndFlw was the best heuristic – Among those we tried
![Page 80: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/80.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore80
In-network coding
• RndFlw was the best heuristic – Among those we tried
• Provided 40% (out of a possible 57%) bandwidth savings at source for a SRC(7,3) code
![Page 81: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/81.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore81
In-network coding
• RndFlw was the best heuristic – Among those we tried
• Provided 40% (out of a possible 57%) bandwidth savings at source for a SRC(7,3) code– An increase in the data-insertion throughput btw. 40-60%
![Page 82: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/82.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore82
In-network coding
• RndFlw was the best heuristic – Among those we tried
• Provided 40% (out of a possible 57%) bandwidth savings at source for a SRC(7,3) code– An increase in the data-insertion throughput btw. 40-60%
• No free lunch: Increase of 20-30% overall network traffic
![Page 83: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/83.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore83
Data insertion
Replicas
Erasure coded data
pipelined
inser
tion
e.g., d
ata
for analy
tics
In-network coding
e.g., multimedia
Data access
fault-tolerant data access
(MSR’s Reconstruction
code)
Recap
(partial) re-encode/repair (e.g., Self-Repairing Codes)
![Page 84: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/84.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore84
Data insertion
Replicas
Erasure coded data
pipelined
inser
tion
e.g., d
ata
for analy
tics
In-network coding
e.g., multimedia
(partial) re-encode/repair (e.g., Self-Repairing Codes)
archival of
“cold” data
Data access
fault-tolerant data access
(MSR’s Reconstruction
code)
Next
![Page 85: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/85.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore85
RapidRAID
• Ref:– RapidRAID: Pipelined Erasure Codes for Fast Data Archival in
Distributed Storage Systems (Infocom 2013) • Has some local repairability properties, but that aspect is yet to be
explored
– Another code instance @ ICDCN 2013• Decentralized Erasure Coding for Ecient Data Archival in Distributed
Storage Systems– Systematic code (unlike RapidRAID)– Found using numerical methods, and a general theory for the construction of
such codes, as well as their repairability properties are open issues
![Page 86: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/86.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore86
RapidRAID
• Ref:– RapidRAID: Pipelined Erasure Codes for Fast Data Archival in
Distributed Storage Systems (Infocom 2013) • Has some local repairability properties, but that aspect is yet to be
explored
– Another code instance @ ICDCN 2013• Decentralized Erasure Coding for Ecient Data Archival in Distributed
Storage Systems– Systematic code (unlike RapidRAID)– Found using numerical methods, and a general theory for the construction of
such codes, as well as their repairability properties are open issues
• Problem statement: Can the existing (replication based) redundancy be exploited to create an erasure coded archive?
![Page 87: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/87.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore87
Slight change of view
S1 S
2 S3 S
4
S1 S
2 S3 S
4
S1 S
2 S3 S
4
S1
S1 S1
S2
S2 S2
S3
S3 S3
S4
S4 S4
Two ways to look at replicated data
![Page 88: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/88.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore88
RapidRAID
centralized encoding process
![Page 89: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/89.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore89
RapidRAID
Decentralizing the hitherto
centralized encoding process
![Page 90: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/90.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore90
RapidRAID – Example (8,4) code
![Page 91: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/91.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore91
• Initial configuration
RapidRAID – Example (8,4) code
![Page 92: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/92.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore92
• Initial configuration
• Logical phase 1: Pipelined coding
RapidRAID – Example (8,4) code
![Page 93: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/93.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore93
RapidRAID – Example (8,4) code
![Page 94: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/94.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore94
• Logical phase 2: Further local coding
RapidRAID – Example (8,4) code
![Page 95: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/95.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore95
• Logical phase 2: Further local coding
RapidRAID – Example (8,4) code
Resulting Linear Code
![Page 96: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/96.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore96
RapidRAID: Some results
![Page 97: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/97.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore97
RapidRAID: Some results
![Page 98: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/98.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore98
Data insertion
Replicas
Erasure coded data
pipelined
inser
tion
e.g., d
ata
for analy
tics
In-network coding
e.g., multimediaarchival of
“cold” data
Data access
fault-tolerant data access
(MSR’s Reconstruction
code)
Big pic
Agenda: A composite system achieving all these
properties …(partial) re-encode/repair
(e.g., Self-Repairing Codes)
![Page 99: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/99.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore99
Wrapping up: A moment of reflection
• Revisiting repairability – an engineering alternative– Redundantly Grouped Cross-object Coding for Repairable Storage
(APSys2012)– The CORE Storage Primitive: Cross-Object Redundancy for Efficient Data
Repair & Access in Erasure Coded Storage (arXiv: arXiv:1302.5192)• HDFS-RAID compatible implementation • http://sands.sce.ntu.edu.sg/StorageCORE/
![Page 100: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/100.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore100
Wrapping up: A moment of reflection
e11
e21
em1
p1
…
e12
e22
em2
p1
…
e1k
e2k
emk
pk
……
e1k+1
e2k+1
emk+1
pk+1
…e1n
e2n
emn
pn
…
…
RA
ID-4
of
eras
ure
code
d pi
eces
of
diff
eren
t obj
ects
Erasure coding of individual objects
(reminiscent of product codes!)
![Page 101: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/101.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
Separation of concerns
• Two distinct design objectives for distributed storage systems– Fault-tolerance– Repairability
• An extremely simple idea– Introduce two different kinds of redundancy
• Any (standard) erasure code – for fault-tolerance
• RAID-4 like parity (across encoded pieces of different objects) – for repairability
![Page 102: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/102.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
CORE repairability
• Choosing a suitable m < k– Reduction in data transfer for repair– Repair fan-in disentangled from base code parameter “k”
• Large “k” may be desirable for faster (parallel) data access• Codes typically have trade-offs between repair fan-in, code parameter
“k” and code’s storage overhead (n/k)
• However: The gains from reduced fan-in is probabilistic– For i.i.d. failures with probability “f”
• Possible to reduce repair time– By pipelining data through the live nodes, and computing partial
parity
![Page 103: © 2013, A. Datta & F. Oggier, NTU Singapore Storage codes: Managing Big Data with Small Overheads Presented by Anwitaman Datta & Frédérique E. Oggier Nanyang.](https://reader031.fdocuments.us/reader031/viewer/2022032701/56649c785503460f9492d202/html5/thumbnails/103.jpg)
© 2013, A. Datta & F. Oggier, NTU Singapore
• Interested to
– Follow: http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/
• Also, two surveys on (repairability of) storage codes – one short, at high level (SIGACT Distr. Comp. News, Mar. 2013) – one detailed (FnT, June 2013)
– Get involved: {anwitaman,frederique}@ntu.edu.sg103 ଧନ୍ଯ�ବା�ଦ