Accelerating Data Management - Dave Fellinger - RDAP12
-
Upload
asist -
Category
Technology
-
view
897 -
download
2
description
Transcript of Accelerating Data Management - Dave Fellinger - RDAP12
ddn.com
©2012 DataDirect Networks. All Rights Reserved.
Accelerating Data Management
03.20.2012
Dave Fellinger
Chief Scientist, Office of Strategy & Technology
Research Data Access and Preservation Summit
March 22, 2012
ddn.com
©2012 DataDirect Networks. All Rights Reserved.3
Data Management
Data management can consist of;► Access management
• Client access and maintenance• User permissions• Policies including data security• Policies including data continuance
► Data manipulation utilizing microservices• Data checking• Implementing processes such as reduction and filtering• Data registration and metadata extraction• Data migration
ddn.com
©2012 DataDirect Networks. All Rights Reserved.4
Sources of Service Latency
► Hardware Chain• Disk drive servo operation• Multiple SCSI layers• Multiple bus transitions• Memory bandwidth limitations• Network service latencies
► Software Chain• Memory copies• Kernel operations• Layers of consecutive operations including the service of V-nodes,
I-nodes and FAT• Serial data transport processes
ddn.com
©2012 DataDirect Networks. All Rights Reserved.
What is ‘Embedded Processing’?And why ?
► Do data intensive processing as ‘close’ to the storage as possible.• Bring computing to the data instead of bring data to computing
► HADOOP is an example of this approach.
► Why Embedded Processing?► Moving data is a lot of work► A lot of infrastructure needed
► So how do we do that?
Client sends a request to storage (red ball)
Storage responds with data (blue ball)
Client
StorageBut what we really want is :
ddn.com
©2012 DataDirect Networks. All Rights Reserved.
Storage with Virtual Machines
RAID ProcessorsRAID Processors
System memorySystem memory
High-Speed CacheHigh-Speed Cache
VirtualMachineVirtual
Machine
Interface VirtualizationInterface Virtualization
Internal SAS SwitchingInternal SAS Switching
Cache LinkCache Link
8 x IB QDR/10GbE Host Ports (No Fibre Channel)
VirtualMachineVirtual
MachineVirtual
MachineVirtual
MachineVirtual
MachineVirtual
Machine
ddn.com
©2012 DataDirect Networks. All Rights Reserved.
Repurposing Interface Processors
► In the block based SFA10K platform, the IF processors are responsible for mapping Virtual Disks to LUNs on FC or IB
► In the SFA10KE platform the IF processors are running VMs► The OS running on those VMs uses a driver to access the
RAID processors directly► RAID processors place data (or use data) directly in the VM’s
memory► One hop from disk to VM’s memory
► Now the storage is no longer a block device► It is a storage appliance with processing capabilities
ddn.com
©2012 DataDirect Networks. All Rights Reserved.
Example configuration
► Now we can put iRODS inside the RAID controllers► The iCAT processor has lots of memory and SSDs for DB storage► Either use all VMs for iRODS or add a parallel filesystem such as
GPFS for fast scratch► The filesystem uses SAS for frequent used files and SATA for the
rest► The following example is a mix of iRODS with GPFS
• This give iRODS the fastest access to the storage because it doesn’t have to go onto the network to access a fileserver. It lives inside the fileserver.
• The same filesystem is also visible from an external compute cluster via GPFS running on the remaining VMs
► This is only one controller, the 4 VMs on the other controller need some work too• They see the same storage and can access it at the same speed.
ddn.com
©2012 DataDirect Networks. All Rights Reserved.
Example configuration
RAID ProcessorsRAID Processors
System memorySystem memory
VirtualMachineVirtualMachine
Interface VirtualizationInterface Virtualization
Internal SAS SwitchingInternal SAS Switching
8x 10GbE Host Ports
VirtualMachineVirtualMachine
VirtualMachineVirtualMachine
VirtualMachineVirtualMachine Linux
16 GB memory
allocated
SFA Driver
Linux
8 GB memory
allocated
RAID sets with 2TB SSD
RAID sets with 2TB SSD
Linux Linux
High-Speed CacheHigh-Speed Cache Cache LinkCache Link
GPFS
SFA Driver
GPFS
SFA Driver
GPFS
iCAT
SFA Driver
8GB memory
allocated
8GB memory
allocated
RAID sets with 300TB SATA
RAID sets with 300TB SATA
RAID sets with 30TB SAS
RAID sets with 30TB SAS
ddn.com
©2012 DataDirect Networks. All Rights Reserved.
Running Micro Services as a VM
► Since iRODS runs inside the controller we now can run iRODS MicroServices right on top of the storage.
► The storage has become an iRODS appliance ‘speaking’ iRODS natively.
► iRODS can execute “in-band” operations registering data and extracting metadata during injest.
► We could create ‘hot’ directories that kick off processing depending on the type of incoming data.
ddn.com
©2012 DataDirect Networks. All Rights Reserved.11
Conclusion
► The elimination of software or hardware layers increases reliability and decreases latency.
► Automated, policy based services can be run within a storage environment.
► Policy execution can include data migration, data checking, or data manipulation by calling or scheduling microservices.
Data intensive operations executed by a server can cause network traffic and SCSI bus transaction latency. Moving these operations to the storage is efficient and easily managed.
ddn.com
©2012 DataDirect Networks. All Rights Reserved.12
DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler, xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited.
Questions?