Accelerating Data Management - Dave Fellinger - RDAP12

11
ddn.com ©2012 DataDirect Networks. All Rights Reserved. Accelerating Data Management 03.20.2012 Dave Fellinger Chief Scientist, Office of Strategy & Technology Research Data Access and Preservation Summit March 22, 2012

description

Accelerating Data Management Dave Fellinger, DataDirect Networks Presentation at Research Data Access & Preservation Summit 22 March 2012

Transcript of Accelerating Data Management - Dave Fellinger - RDAP12

Page 1: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.

Accelerating Data Management

03.20.2012

Dave Fellinger

Chief Scientist, Office of Strategy & Technology

Research Data Access and Preservation Summit

March 22, 2012

Page 2: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.3

Data Management

Data management can consist of;► Access management

• Client access and maintenance• User permissions• Policies including data security• Policies including data continuance

► Data manipulation utilizing microservices• Data checking• Implementing processes such as reduction and filtering• Data registration and metadata extraction• Data migration

Page 3: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.4

Sources of Service Latency

► Hardware Chain• Disk drive servo operation• Multiple SCSI layers• Multiple bus transitions• Memory bandwidth limitations• Network service latencies

► Software Chain• Memory copies• Kernel operations• Layers of consecutive operations including the service of V-nodes,

I-nodes and FAT• Serial data transport processes

Page 4: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.

What is ‘Embedded Processing’?And why ?

► Do data intensive processing as ‘close’ to the storage as possible.• Bring computing to the data instead of bring data to computing

► HADOOP is an example of this approach.

► Why Embedded Processing?► Moving data is a lot of work► A lot of infrastructure needed

► So how do we do that?

Client sends a request to storage (red ball)

Storage responds with data (blue ball)

Client

StorageBut what we really want is :

Page 5: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.

Storage with Virtual Machines

RAID ProcessorsRAID Processors

System memorySystem memory

High-Speed CacheHigh-Speed Cache

VirtualMachineVirtual

Machine

Interface VirtualizationInterface Virtualization

Internal SAS SwitchingInternal SAS Switching

Cache LinkCache Link

8 x IB QDR/10GbE Host Ports (No Fibre Channel)

VirtualMachineVirtual

MachineVirtual

MachineVirtual

MachineVirtual

MachineVirtual

Machine

Page 6: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.

Repurposing Interface Processors

► In the block based SFA10K platform, the IF processors are responsible for mapping Virtual Disks to LUNs on FC or IB

► In the SFA10KE platform the IF processors are running VMs► The OS running on those VMs uses a driver to access the

RAID processors directly► RAID processors place data (or use data) directly in the VM’s

memory► One hop from disk to VM’s memory

► Now the storage is no longer a block device► It is a storage appliance with processing capabilities

Page 7: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.

Example configuration

► Now we can put iRODS inside the RAID controllers► The iCAT processor has lots of memory and SSDs for DB storage► Either use all VMs for iRODS or add a parallel filesystem such as

GPFS for fast scratch► The filesystem uses SAS for frequent used files and SATA for the

rest► The following example is a mix of iRODS with GPFS

• This give iRODS the fastest access to the storage because it doesn’t have to go onto the network to access a fileserver. It lives inside the fileserver.

• The same filesystem is also visible from an external compute cluster via GPFS running on the remaining VMs

► This is only one controller, the 4 VMs on the other controller need some work too• They see the same storage and can access it at the same speed.

Page 8: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.

Example configuration

RAID ProcessorsRAID Processors

System memorySystem memory

VirtualMachineVirtualMachine

Interface VirtualizationInterface Virtualization

Internal SAS SwitchingInternal SAS Switching

8x 10GbE Host Ports

VirtualMachineVirtualMachine

VirtualMachineVirtualMachine

VirtualMachineVirtualMachine Linux

16 GB memory

allocated

SFA Driver

Linux

8 GB memory

allocated

RAID sets with 2TB SSD

RAID sets with 2TB SSD

Linux Linux

High-Speed CacheHigh-Speed Cache Cache LinkCache Link

GPFS

SFA Driver

GPFS

SFA Driver

GPFS

iCAT

SFA Driver

8GB memory

allocated

8GB memory

allocated

RAID sets with 300TB SATA

RAID sets with 300TB SATA

RAID sets with 30TB SAS

RAID sets with 30TB SAS

Page 9: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.

Running Micro Services as a VM

► Since iRODS runs inside the controller we now can run iRODS MicroServices right on top of the storage.

► The storage has become an iRODS appliance ‘speaking’ iRODS natively.

► iRODS can execute “in-band” operations registering data and extracting metadata during injest.

► We could create ‘hot’ directories that kick off processing depending on the type of incoming data.

Page 10: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.11

Conclusion

► The elimination of software or hardware layers increases reliability and decreases latency.

► Automated, policy based services can be run within a storage environment.

► Policy execution can include data migration, data checking, or data manipulation by calling or scheduling microservices.

Data intensive operations executed by a server can cause network traffic and SCSI bus transaction latency. Moving these operations to the storage is efficient and easily managed.

Page 11: Accelerating Data Management - Dave Fellinger - RDAP12

ddn.com

©2012 DataDirect Networks. All Rights Reserved.12

DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler, xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited.

Questions?