RozoFS: a fault tolerant I/O intensive distributed file...

24
RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette erasure code Workshop Autonomic Oct. 16-17 2014 Laas, Europe, Toulouse Benoît Parrein, Dimitri Pertin*, Nicolas Normand, Didier Féron° Université de Nantes, IRCCyN Lab, UMR 6597 * Université de Nantes - Fizians SAS °Fizians SAS

Transcript of RozoFS: a fault tolerant I/O intensive distributed file...

Page 1: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette erasure code

Workshop Autonomic Oct. 16-17 2014

Laas, Europe, Toulouse

Benoît Parrein, Dimitri Pertin*, Nicolas Normand, Didier Féron°

Université de Nantes, IRCCyN Lab, UMR 6597

* Université de Nantes - Fizians SAS

°Fizians SAS

Page 2: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

2

Outline

Storage and file systems

FEC4Cloud project

Mojette Erasure code

RozoFS global architecture

Performances

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 3: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

3

The storage in the world

40 Exabytes (1018 bytes) stored in 2020

15 EB (37%) in the Cloud(s)

7,5 EB (50%) video, images, ...

Page 4: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

4

High availability means...

99.999999...% reachable

Copies and copies and copies... (up to 7 times)

Hard disks and hard disks and hard disks...

High consumption of energy

Privacy problems

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 5: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

5

Page 6: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

6

Requirements

Convergence archiving, big data, virtualisation, hpc,...

=

Convergence of cold and hot data

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 7: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

7

Distributed File Systems

HDFS

CephFS, GlusterFS,...

Scality (based on Chord)

Facebook file system (f4)

...

Mix of replicas and erasure coding (sometime): – none uses erasure codes always

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 8: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

8

FEC4Cloud Project

ANR 2012 (appel Emergence)

Partners: IRCCyN (lead), ISAE, SATT-Ouest Valorisation

Budget: 256 K€

Duration: 24 months (product oriented)

Goal: promoting erasure codes within Cloud storage infrastructure

Page 9: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

9

Erasure codes (MDS property)

k messages

n code-words

k receivedcode-words

k messages

EncodingEncoding

Decoding

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 10: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

10

Implementations

Reed-Solomon (by Cauchy matrices [Byers, 1995])

Reed-Solomon (by Vandermonde matrices [Rizzo, 1998] now a RFC5510)

Cauchy “Good” [Planck, 2008] in Jerasure 1.2

Intel ISA-L (includes SSE instructions)

...

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 11: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

11

Mojette erasure coding

Based on Radon transform

(1+Ɛ)MDS Code

Linear Complexity

Systematic and Non Systematic

Fig.1 : Computing of two projections (-2,1) and (0,1)

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 12: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

12

Optimizations

Deterministic path of reconstruction (function of available projections pattern) [Normand, 2006]

+Drastic reduction in writes [engineers of Fizians, 2013]

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 13: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

15

RozoFS

I/O Centric Distributed File System– POSIX Scale-out storage

– Commodity hardware

– Fault tolerance (up to 4 failures)

– Based on erasure coding (Mojette coding)

– Dedicated to cold and hot data

Open source project

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 14: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

16

metadata

Rozofsmount

Client Node

Storaged

Storage Node

Metadata Server

exportd

Storaged

Storage Node

Storaged

Storage Node

Data p

athD

ata path

Control path

Pool of storage Nodes

Monitoring

...

Page 15: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

17

Read/Write function (in non-sys coding)

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 16: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

18

Testbed

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 17: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

19

Performances

Sequential access (4K blocks)

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 18: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

20

Performances

Random access (4K blocks)

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 19: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

GigE infrastructure

(data storage and metadata)

+

Standard GigE Infrastructure

Niveau clients/applications

ExternalNetworkExternalNetwork

Rozofsmount

Exportd

RozoFS

Rozofsmount Storage

Rozofsmount Storage

Rozofsmount Storage

Rozofsmount Storage

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 20: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

22

Conclusions

RozoFS is an I/O centric distributed file system based on a erasure code (always)

Performances: 100K IOPS, throughput of 6 Gbps...

RozoFS follows up the infrastructure

Apps: on line video editing, virtualisation (QEMU), database...

participate to the convergence of cold and hot dataNext: privacy (to check), grid5000 experiments (to come), deduplication (to attach)

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 21: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

23

Page 22: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

24

Credits

https://github.com/rozofs

UMR 6597

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 23: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

25

Backup slides

Workshop Autonomique, Oct. 16-17 2014, Toulouse, France - Benoît Parrein, Université de Nantes, IRCCyN Lab

Page 24: RozoFS: a fault tolerant I/O intensive distributed file ...projects.laas.fr/.../parrein_slides.pdf · RozoFS: a fault tolerant I/O intensive distributed file system based on Mojette

26

Server type Fujistu RX300-S8 (R3008S0035FR)CPU model name 2 x Intel Xeon CPU E5-2650 v2 @ 2.60GHz (8 cores & 16 threads/core)Memory (GB) 64 GBRAID card RAID Controller SAS 6Gbit/s 1GB (D3116C)Virtual DRIVE 0 - Seagate Constellaton.2, SAS 6Gb/s, 1TB, 2.5", 7200 RPM (ST91000640SS)- 11 drives- RAID 5Virtual DRIVE 1 - Seagate Pulsar.2, SAS 6Gb/s, 100GB, 2.5", MLC (ST100FM0002)- 1 drive- RAID 0Virtual DRIVE 2 - WD Xe, SAS 6Gb/s, 900GB, 2.5", 10000 RPM (WD9001BKHG)- 4 drives- RAID 0Ethernetcontrollers - Intel 82599EB 10-Gigabit SFI/SFP+ - 2*10Gb- Intel I350 Gigabit Network - 2*1Gb- Intel I350 Gigabit Network - 4*1Gb