Synchronizing Lustre file systems

13
Synchronizing Lustre file systems Dénes Németh ([email protected]) Balázs Fülöp ([email protected]) Dr. János Török ([email protected]) Dr. Imre Szeberényi ([email protected])

description

Synchronizing Lustre file systems. Dénes Németh ( [email protected] ) Balázs Fülöp ( [email protected] ) Dr. János Török ( [email protected] ) Dr. Imre Szeberényi ( [email protected] ). The current state of art. Partially solved Conventional local file systems - PowerPoint PPT Presentation

Transcript of Synchronizing Lustre file systems

Page 1: Synchronizing Lustre file systems

Synchronizing Lustrefile systems

Dénes Németh ([email protected])

Balázs Fülöp ([email protected])

Dr. János Török ([email protected])

Dr. Imre Szeberényi ([email protected])

Page 2: Synchronizing Lustre file systems

The current state of art

• Partially solved– Conventional local file systems– Off-line operation (rsync)

• Problems– Walk through the directory structure – Have to know what will change (Inotify)– Does not work on distributed file systems– Scalability problems

Page 3: Synchronizing Lustre file systems

The environment - Lustre

• Distributed– Stripes (part of a file) on separate hosts– ~100-1000 clients (reading writing)

• Redundant– File system and file metadata

• Fault tolerance– Transaction driven operations– Rollback capability

Page 4: Synchronizing Lustre file systems

Lustre – synchronization

• Distributed– Hosts absolute event sequencing

• Is the time accurate enough?

– Clients extreme efficiency

• Redundant – Fault tolerance– Pulling the plug during synchronizing

• Moving, tracking events

– Rollback synchronize to transactions

Page 5: Synchronizing Lustre file systems

The basic Lustre concept

Object StorageTargets

Lustre Server Side Lustre Client SideMetadata

Server

failover

~100-1000

„inode”

Page 6: Synchronizing Lustre file systems

Moving the information - metadata

Object StorageTargets

Lustre Server Side Lustre Client SideMetadata

Server

~100-1000

LustreMetadataAccess

Kernel space

Local EventSequencer Global Event

SequencerEvent

Reporter

EventMultiplexer

EventProcessor

Page 7: Synchronizing Lustre file systems

How-to move the informationMetadata

Server

Local EventSequencer Global Event

SequencerEvent

Reporter

EventMultiplexer

EventProcessor

Block Device

Proc FileSystem

TCP/IPNet

work

TC

P/I

PN

etw

ork

TC

P/I

PN

etw

ork

Block Device

• Asynchrone notification

• system calls:

•Select (timeout)

•Read, write (blocking)

• Max 100.000 events/sec

• Relative Complicated access

Proc FileSystem

• Easy access from user-space

• Notifications through signals

• Possibility for multiple reporters

• Minimal network usage

• Usually not a bottleneck

• ER & EM can be deployed together or separately

TCP/IPNet

work

• Just multiplexing events

• No problems

• No authorization, registration

(fix configuration)

TCP/IPNet

work

TC

P/I

PN

etw

ork

TC

P/I

PN

etw

ork

• Big difficulties

• Sequencing = Accurate timing

• Network delay

• Delay from FS overload

• Connection to all MDS

• Can be a bottleneck

Page 8: Synchronizing Lustre file systems

Accurate sequencing

Linearly increasing output

Number oflocal sequencers

Page 9: Synchronizing Lustre file systems

Average sequence performanceServer has enough threads

- Performance OK -

Server needs more threads- Performance DROPS -

Why?~ 5000 event/thread

„Graceful degradation”

Linear drop inperformance

Constant QoS

Page 10: Synchronizing Lustre file systems

Resource usage on the global sequencer

at most 2 ms in each second ~ 0

Page 11: Synchronizing Lustre file systems

How-to commit the changes

MDS OST

SFS 2SFS 1

CommitterClient

EventProcessor

CommitterClient

EventProcessor

MDS OST

SFS 3

EventMultiplexer

MDS OST

EventReporter

EventMultiplexer

EventReporter

CommitterClient

EventProcessor

A B

A4

B3

A4

B3

How-to execute „3” if„4” already happened?

Unfortunately noreal good solution

Page 12: Synchronizing Lustre file systems

Event sequence error resolution

1. Ostrich politic• Drop all evens with conflicting sequence

2. Conflict detection• Is the event applicable?• In design stage …

3. Replaying the already committed events• Currently lack of Lustre support

Page 13: Synchronizing Lustre file systems

Questions?

Thank you for your

Attention!