IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which...

33
IBM Research Lab at Haifa Object Storage and beyond Julian Satran

Transcript of IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which...

Page 1: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Lab at Haifa

Object Storage and beyond

Julian Satran

Page 2: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

2

From Direct Attached Storageto Network Attached Storage…

DAS NAS

Application

File system

Database

Block Storage

Server

File system

Database

Block Storage

Server

NetworkNetwork

Application

Page 3: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

3

…to Storage Area NetworksSAN

File system

Database

Block Storage

Server

Storage Server

NetworkNetwork

Application

Page 4: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

4

Combined TechnologiesApplication

File system

Database

Block Storage

File/Database Server

Storage Server

NetworkNetwork

Application Server

NetworkNetworkNetwork: IP

NFS Protocol: CIFS

Network – FibreChannel/IPProtocol – FCP/iSCSI (SCSI)

Page 5: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

5

SAN promise – Unmediated Access to Data

Application

File system

Database

File/Database Server

Application Server

NetworkNetwork

Control

Issue –security! Data

Block Storage

Storage Server

NetworkNetwork

Page 6: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

6

SAN promise – Disk Sharing

File system

Database

File/Database Server

NetworkNetwork

ApplicationApplication Server

ApplicationApplication Server

ApplicationApplication Server

ApplicationApplication Server

ApplicationApplication Server

ApplicationApplication ServerApplication

Application Server

NetworkNetworkControlIssue - security

Data

Block Storage

Storage ServerBlock Storage

Storage ServerBlock Storage

Storage Server

Page 7: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

7

Object Storage

Today's Block Device

Object Store

Operationsread blockwrite block

SecurityWeakFull disk

AllocationExternal

Operationsread object offsetwrite object offsetcreate objectdelete object

SecurityStrongPer Object

AllocationLocal

Page 8: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

8

Object Store Operations

Basic OperationsCreate ObjectDelete ObjectWrite Offset in ObjectRead Offset in Object

Administrative OperationsBasic abstract flow

Create an object, getting back an object IDClients responsibility to remember the ID

Send requests to read and write the object given the IDDelete the object when done using

Page 9: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

9

Object Store SecurityAll operations are secured by a credentialSecurity achieved by cooperation of:

Admin - authenticates, authorizes and generates credentials.ObS - validates credential that a host presents.

Credential is cryptographically hardenedObS and admin share a secret

Goals of Object Store security are:Increased protection/security

At level of objects rather than whole LUsHosts do not access metadata directly

Allow non-trusted clients to sit in the SANAllow shared access to storage without giving clients access to all data on volume

Security Admin

Shared Secret

Authorization Req

Credential

Client

Credential

Object Store

Page 10: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

10

Object Store OperationsRead

ReadParameters: Object Store ID, Object ID, Offset, Length, CredentialsBasic steps an object store must provide

Receive requestValidate credentialsFind allocation data for indicated object

Map offset and length to a collection of LBAs in an underlying block storage

Stage the data if necessaryGather the data and return to the host

Issues and VariantsBlock alignmentRead of non-allocated data (sparse vs. past "end" of object)

Page 11: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

11

Object Store OperationsWrite

WriteParameters: Object Store ID, Object ID, Offset, Length, Credentials, DataBasic steps an object store must provide

Receive requestValidate credentialsFind allocation data for indicated object

Determine if the indicated range is already bound to a collection of underlying LBAs

If not already boundDetermine the mappingUpdate the metadata

Destage the data to the indicated LBAsIssues/Variants

Use of a non-volatile write cacheLate binding

Hardening the metadata updatesEnsuring metadata updates are only modified if data is hardenedBlock alignment

Page 12: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

12

Capability Structure

ObS RightsObject Rights

TypeDoes the credential apply to a specific object or entire object store

Object RightsRead, Write, Append, Truncate, Create (given an ID), Delete, Info

Object Store (ObS) RightsFormat, Create (ObS generates ID), Info on Object Store

Ver(sion)Used to allow the credential to time out

Type Object IDObS ID WR A T F C* VerC D I I*

Page 13: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

13

Admin ClientCredential Structure

CapabilityOperations that the credential entitles

NonceA salt generated by the Admin

Different for every credentialMAC -- Message Authentication Code

Standard cryptographic hash on the capability and the encrypted secretEnsures host cannot alter/forge a credential

Secret The (un-encrypted) secretUsed by the client to verify that it got the credential form the Admin

Capability Nonce MAC Secret

Client ObS

public credential private credential

Page 14: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

14

41 2 5 7

SAN File System without an Object Store

Client

File Manager

3

1. Create File2. Return allocation bitmap3. I/O (unsecured)4. Request additional

allocation bitmap5. Return allocation bitmap6. I/O (unsecured)7. Return actual allocation

data to File Manager

6

Block Device

Page 15: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

15

12

SAN File System with an Object Store

Client

1. Create File2. Return Locks and

Credentials3. Create Object and perform

I/O with Credentials4. I/O with Credentials

File Manager

3 4

Object Store

Page 16: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

16

Blocks vs. Objects vs. Files

Block Storage Object Storage File Storage

FastestSmall set of

operationNo connection to

user’s abstractionDumb deviceNo access security

FastSmall set of

operationsObject usually

maps to user abstractionLocal space

managementEnable end-to-end

managementAccess is Secure

SlowRich set of operations

(sometimes more than what the application requires)

LockingHierarchical name

service…

File usually maps to user abstractionAccess is Secure (strength

depending on the protocol)

Page 17: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

17

How Real Are Object Stores?Object Stores

First big pushGarth Gibson, et al., NASD -- CMU, Panasas

DSF Storage ManagerAntara – a prototype Object StoreLustre and Object Store Target (LLNL)CAS (EMC Centera – A Write-Once ObS). . .

DriversiSCSI

IP access to storage exacerbates SAN security problemsData Sharing Facility (DSF)

A highly scalable research file system which incorporated an object-store like component to ensure local space allocation

Storage Tank and other SAN file systemsShared access requires SAN security (or trusted clients!)

Page 18: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

18

What is Antara*?

A prototype implementation of an object store as a standalone control unitFirst generation ObS prototype

Generation zero: DSF Storage ManagerGeneration two…

Initial focus on integrity and recoverability of metadata and scalable designCurrent focus on demonstrating "reasonable" performance

Multiple versions have been developedFeatures

IP connectivity (but design is fairly transport independent)Supports the Antara ObS protocol

Similar in functionality to T10 draft but not SCSI Commands currently supported:

Open/Close Session, Create/Delete Object, Read/Write/Append, Truncate, . . .Export a single object storeCurrent version assumes a non-volatile store

Prior versions did not have this assumption

* "interior" in Sanskrit

Page 19: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

19

General DesignA conceptual pipeline

Uses completion ports to avoid context switchesIn most cases same operating system thread handles multiple stages

The modules are:I-Module (communication input, connection management)S_Module (security)C_Module (control and dispatching)L_Module (lookup, metadata tables and locking mechanisms)RW_Module (read-write of data and log of metadata)O_Module (communication output)

The C, L, and S modules are mostly object store uniqueI, RW, and O provide functions that are in most control units

I_Module S_Module

start new request

O_Module

L_Module

C_Module RW_Module

error path

Abstraction of Antara Flow

Page 20: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

20

Flow for Read and Write in Antara1 (I) Receive CB, determine needed buffers, allocate and receive data into buffers2 (C) Mark the request as running; Delay this request if clashes with other operation3 (C,I) Notify OS ready to receive the next request from the client

Additional requests handled in parallel4 (S) Check Security (e.g., proper session, credentials, etc.)5 (L) Perform necessary lookups (and/or allocations in case of writes)

Put allocation information in the request6 (RW) Read/write the data from the cache7 (L) Mark the request as done

1Allows delayed requests to start, e.g., read-write conflict although read response may need to wait for log complete

8 (L) Log the request (this is a no-op in the case of read)Ensures hardening of the allocation information

9 (L) Mark the request as logged (this is a no-op in the case of read)Allows delayed requests to complete

• (O) Send the reply • (L) Mark the request as sent• (O) Deallocate buffers used for the request

I_Module S_Module

start new request

O_Module

L_Module

C_Module RW_Module

error path

Page 21: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

21

Metadata

Significant effort invested in designing efficient, scalable data structuresMinimize contention for concurrent clients and multiple processors

Object Directory maps from OID to object's metadata via a sparse hashtableObject metadata includes object size and a block number tableBlock number table maps from offset in block to extents

Small objects use a dynamic, linear-probe, hash table.Large objects use a b-tree.

Freespace managed via a buddy list bitmap

Object Directory

Object Meta Data

Block Number Table

Small objects (implemented as

hash-tables)

Large objects (implemented as b-

trees)

Page 22: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

22

Metadata (cont)

The linear-probe hash table used for mapping OIDs to OMDs was chosen for parallelism

Not all of the metadata can fit into main memory.Some metadata accesses will require disk accesses Shared/exclusive locks, created on-the-fly, allow locking only specific entries of the hash table.

The block-number table implementations were chosen for minimizing page-faults

For large objects, b-trees were chosen based on similar choices in databases.For small objects, an entire b-tree page is inefficient in terms of space and time.

Entries in block number table represent extentsExtents range from one block up to the maximum amount of data transferred in a write request

Page 23: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

23

Antara Space AllocationGoals

Logically consecutive blocks of an object will map to consecutive LBAsAvoid fragmentationQuick allocation decisions

ApproachMaintain a MRU cache of objects receiving allocating writesNon-persistently associate with each object a large extent of consecutive LBAsFor an allocating write, take space from large extent at appropriate offset

Persistently update free space bitmap

Page 24: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

24

Storage Tank Background (IBM Research Almaden)

Control Network (IP)

ST client

UNIX

Backup

Data Data

data

ST client

WinCIFS/NFS CIFS/NFS

SAN IP/FC

Meta-dataServer

Meta-dataServer

Meta-dataServer

Meta-data

Capabilities:Performance and semantics similar to local file systemSharing like NASPolicy-based, centralized storage management

Page 25: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

25

Object Store and Storage TankStorage Tank was designed with an Object Store in mindObject Store will be at same level as control unitsTo integrate an Object Store into Storage Tank requires changes to

Meta Data ServerStorage Tank Client

Main customer benefit will be security and protectionOther benefits will follow

ObS initiator is the callable module that interacts with the ObS TargetInitiator integrated into both Storage Tank client and Meta Data ServerCurrent Initiator comes in several flavors

Synchronous and AsynchronousUser-mode and Kernel-mode

Page 26: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

26

Object Store added to Storage Tank

Control Network (IP)

ST agent

UNIX

Backup

data

ST agent

Win

Data Data

CIFS/NFS CIFS/NFS

SAN IP/FC

Meta-dataServer

Meta-dataServer

Meta-dataServer

Meta-data

Object Store

Object Storeadded:

SAN securityScalabilityManageability

Page 27: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

27

ObS vs. alternatives

Complete segregation of the disk space (secure large entities)Pros

No infrastructure changeCons

No sharingRequires careful synchronization

NAS – the only complete alternativeRequires all application to use the FS naming structure (heavy)Does not scale

Page 28: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

28

What next

Data repositories with semantic access (next generation FS)Closer “coupling” with search and indexing enginesPassive and active access models (push/pull)Richer security models (including securing data at rest)Multimodal data

Page 29: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

29

Data repositories with semantic access (next generation FS)

With FS taking over a billion files access by file-name is cumbersomeContent based indexes and “web-like” searching are far more convenientIn specialized domains indexing can be based on manually createdmetadataThe most common case is that of “indexing” extracted automatically from content (data, text and media)This technology is required everywhere – from desktops to large seversCurrent access schemes based on a preset DAG should be seen as just instances of more general organization schemes

Page 30: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

30

Closer coupling with indexing and searching

Who should do itLarge repositories will continuously re-index contentIssues/alternatives

Who does it ?How does storage support it (poll, interrupt, specialized data-structures)?

Page 31: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

31

Passive and active access models (push/pull)

Repositories are regularly accessed in push or pull modeShould storage support both or should it keep the current pull (passive) mode only

Page 32: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

32

Richer security models (including securing data at rest)

Security of data at rest is of major importanceIEEE effort of standardization is timelyThere is a strong relationship between ObS and securing data at rest

Page 33: IBM blue-and-white templateData Sharing Facility (DSF) A highly scalable research file system which incorporated an object-store like component to ensure local space allocation Storage

IBM Research Laboratory at Haifa

33

Multimodal data

Indexing and searching in multimedia is highly specializedRequires substantial computing power but unlikely to change frequently