IBM Active Cloud Engine/Active File Management
Transcript of IBM Active Cloud Engine/Active File Management
Agenda
Need of ACE?
Inside ACE
Use Cases
Data Movement across sites
How do you move Data across sites today?
FTP, Parallel FTP
SCP
Backup to tape and Fedex
Issues
Pre planned, user initiated
Replica Mgmt
What if this data needs to move to multiple sites
very frequently
Data Movement between sites
What if there is a tool
That pulls data on demand
No explicit user initiation
That moves data periodically & smartly
That moves only changed data
That effectively uses the network
Manages these replicas keeping staleness in
control?
Is there such a tool?
Panache/ACE/AFM
ACE Global provides
Seamless data movement between clusters
On demand
Periodically
Continuously
Provide a persistent scalable POSIX-compliant
cache for remote filesystem
Even during disconnection
Moving data between locations can be slow and data copies itself can become stale
Once And data is not persistent…
Write
Read
Read
Read
But customers need to collaborate immediately with up
to date changes
Inside ACE
/home/appl/data/web/spreadsheet.xls
/home/appl/data/web/drawing.ppt
Panache Overview: Reads
/home
/appl
/data
/web
/home/appl/data/web/drawing.ppt
GPFS Panache
Scale out cache
Storage Array
Storage node
Storage node
Interface node
Interface node
/home/appl/data/web/spreadsheet.xls
Remote user reads local edge device for file
On demand-read from home site
Local cache to disk
Read Can run disconnected
Panache
NFS
CIFS
HTTP
VFS
Gateway node
Home Site Cluster
/home/appl/data/web/spreadsheet.xls
/home/appl/data/web/drawing.ppt
Asynchronous write back
/home
/appl
/data
/web
/home/appl/data/web/drawing.ppt
Storage node
Storage node
Interface node
Interface node
/home/appl/data/web/spreadsheet.xls
Remote user writes file to local edge
device
Local cache to disk Log write to memory Q
1. Write Periodically, or when
nw is connected
Panache scale out cache
Panache
Home cluster
Asynchronous Updates (write, create, remove)
Updates at cache site are pushed back lazily Mask the latency of the WAN
Data is written to GPFS at cache site synchronously
GW node queues the update for later execution Performance identical to a local file system update
Writeback is asynchronous Configurable asynch delay
GW nodes queue updates and write back to home as network bandwidth permits
Write back tends to coalesce updates and accommodate out-of-order and parallel
writes to files and directories
… maximizing WAN bandwidth utilization
Users can force a sync if needed
Expiration of Data
Staleness Control Defined based on time since disconnection
Once cache is expired, no access is allowed to cache
Manual expire/unexpire option for admin
Allowed onlys for ro mode cache
Disabled for SW & LU as they are sources of data themselves
Panache WAN Caching Features Feature Panache support
Writable cache Yes
Granularity Fileset (dir tree)
Policy based pre-fetching Yes (uses GPFS policy engine rules)
Policy based cache eviction Yes (uses GPFS policy engine rules)
Disconnected mode
operations
Yes (can also expire based on
configured timeout)
Data Transport protocol NFS (uses standard to move data from
any filer)
Streaming support Yes (GPFS policy rules select files to
replicate)
Locking support No (only local cluster wide locks)
Sparse file support Yes (can read as sparse files)
Namespace caching Yes ( gets dir struct along with data)
Parallel data transfer Yes
Use Cases
Use Case: Central/Branch Office
Central Site Data is created, maintained,
updated/changed.
Branch/edge sites periodically prefetch (via policy)
or pull on demand
Data is revalidated when accessed
A typical scenario for this is itunes like music sites
Periodic Prefetch
On Demand Pull
Edge site
(Reader)
HQ Primary Site (Writer)
Use Case: Non-Dependent Writers
Each site writes to the site’s decidated fileset/directory.
A central system which will have all home dirs and backup/hsm will be managed out of this.
UseUser A’s home directory
(writer)
r A’s home directory
(writer)
Backup Site
UseUser B’s home directory (writer)
UseBackujp site
Use Case: Ingest and Disseminate
Central site gets
updates
frequently
Regional/edge
sites can
periodically
prefetch or pull
on demand
Data is
revalidated
Backup Site
Periodic pre-fetch
Data Ingest on location(writer) On Demand Pull
Backup site
Periodic Pull
File System: store1
File System: store2Cache Filesets:
/data1
/data2
Local Filesets:
/data3
/data4
Cache Filesets:
/data5
/data6 Local Filesets:
/data1
/data2
Cache Filesets:
/data3
/data4
Cache Filesets:
/data5
/data6
File System: store2Cache Filesets:
/data1
/data2
Cache Filesets:
/data3
/data4
Local Filesets:
/data5
/data6
SONAS2.ibm.com
SONAS1.ibm.com
SONAS3.ibm.com
Clients connect to:
SONAS:/data1
SONAS:/data2
SONAS:/data3
SONAS:/data4
SONAS:/data5
SONAS:/data6
Clients connect to:
SONAS:/data1
SONAS:/data2
SONAS:/data3
SONAS:/data4
SONAS:/data5
SONAS:/data6
Clients connect to:
SONAS:/data1
SONAS:/data2
SONAS:/data3
SONAS:/data4
SONAS:/data5
SONAS:/data6
Each cache site will export same namespace view
Every fileset is
accessibile from
all sites
Home for
data3 and
data4
HOME
FOR DATA5
AND DATA6
Home for
data1 and
data2
Use Case: Global Namespace (Mesh)
Thank You