Spectrum Scale final

39
IBM Spectrum Scale Software defined storage for cloud, big data & analytics, and NAS solutions A Smarter Cloud for a Smarter Planet Joe Krotz IBM CTS – Cloud and Storage Systems August 2015

Transcript of Spectrum Scale final

IBM Spectrum ScaleSoftware defined storage for cloud, big data &

analytics, and NAS solutions

A Smarter Cloudfor a Smarter Planet

Joe Krotz

IBM CTS – Cloud and Storage Systems

August 2015

SOURCE: *2014 IBM Institute for Business Value Study on Infrastructure Matters; Gartner IT Metrics

The top two challenges organizations face with IT infrastructure are storage related –Data Management and Cost Efficiency

2.5 BillionGigabytes of data per day

Data Explosion

90%of data created in

last two years

Traditional Storage Models are being disrupted by the explosion of data

Data Innovation30% lower TCO with Flash

50% lower storage

management cost and Hybrid delivery with

Software Defined Storage

Data Innovation

0.4% overall IT budget growth

in 2015

670% more data

in 5 years for storage administrators

Data Economics

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

IBM Spectrum Scale – History and evolution

Software Defined Infrastructure

2006200520021998

HPC

GPFS (General Parallel File Systems)

General File Serving Standards Portable

operating system interface (POSIX) semantics-Large block Directory and

Small file perf Data

management

Virtual Tape Server (VTS)

Linux® Clusters (Multiple architectures)

IBM AIX® Loose Clusters

GPFS 2.1-2.3

HPC

ResearchVisualizationDigital MediaSeismicWeather explorationLife sciences

32 bit /64 bitInter-op (IBM AIX & Linux)GPFS Multicluster

GPFS over wide area networks (WAN)

Large scale clustersthousands of nodes

GPFS 3.1-3.2

2009

First called GPFS

GPFS 3.4

Enhanced Windows cluster support- Homogenous Windows Server

Performance and scaling improvements

Enhanced migration and diagnostics support

2010

GPFS 3.3

Restricted Admin Functions

Improved installation

New license model

Improved snapshot andbackup

Improved ILM policy engine

2012

Ease of administration

Multiple-networks/ RDMA

Distributed Token Management

Windows 2008

Multiple NSD servers

NFS v4 Support

Small file performance

Information lifecycle management (ILM)

Storage Pools File sets Policy Engine

GPFS 3.5

Active File Management

GPFS Native RAID

GPFS Shared Nothing Cluster (GPFS-FPO)

GPFS Storage Server

Research in video streaming started in 1993, commercialized in 1994

GPFS 4.1+

Part of IBM Spectrum Storage Software Defined Storage

GPFS 4.1

Encryption

Performance (LROC, AFM)

Usability (Network Monitor, NFS migration, FPO)

Elastic Storage on Linux on System z

Cloud Service on IBM Softlayer

Elastic Storage Server

2014 2015

Code name Elastic

Storage

IBM Spectrum

Scale

GPFS 4.1.1

Enhanced Client experience

New protocols: NSFv4, SMB/CIFS, improved OpenStack Swift

Async DR

Spectrum Scale proven at over 3,000 customers worldwide

Software Defined Infrastructure

Climate and weather modeling withwith 16 Petabytes on line and 12 Petabytes archive on tape

4 time Champion Infiniti Red Bull Racing does real-time race analytics

Wind turbine design analysisDone in hours instead of weeks

Private Cloud for digital media enables global collaboration

for film production Personalized cancer treatment for over 65,000 patients

R&D environment for natural language tools

IBM.com/systems/storage/spectrum/

IBM Spectrum Scale Key Features

IBM Spectrum Scale

Security• Native encryption and secure erase• Disaster Recovery

Scalability and Snapshots• Point in time copies

Performance• Flash acceleration and local read only cache• Fully integrated ILM

Usability• New configuration guidance

• Installation toolkit will quickly install IBM Spectrum Scale software – for Client, Server, FPO, and protocol nodes.

• Tight integration with IBM Spectrum Control for system health monitoring

Data Security, Performance and Usability

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Native Encryption and Secure Erase

• Native: encryption is built into the “Advanced” product

• Protects data from security breaches, unauthorized access, and being lost, stolen or improperly discarded

• Cryptographic erase for fast, simple and secure file deletion

• Complies with NIST SP 800-131A and is FIPS 140-2 certified

• Supports HIPAA, Sarbanes-Oxley, EU and national data privacy law compliance

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Native Encryption and Secure Erase

Encryption of data at rest

• Files are encrypted before they are stored on disk

• Keys are never written to disk

• No data leakage in case disks are stolen or improperly decommissioned

Secure deletion

• Ability to destroy arbitrarily large subsets of a file system

• No “digital shredding”, no overwriting: secure deletion is a cryptographic operation

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Data protection: Disaster recoveryThe challenge: How do I recover data after a major disastrous event that could not be anticipated?

• Force majeure, e.g. earthquake or hurricane

• Accidents, e.g. fire or flood

• Administrator mistakeThe Solution

• IBM Spectrum Scale lets you mirror your data at a secondary site

• Set your Recovery Point Objective (RPO) at say 15 mins, 30 mins, 1 hour, etc.

• If the primary site fails, data requests are automatically redirected to the secondary site

• Asynchronous updates accommodate unreliable networks

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Scalability and Snapshots

IBM Spectrum Scale provides the functionality to create snapshots at the file system, file set, and file level. Each Spectrum Scale file system can have multiple snapshots of any of the types at the same time

Software Defined Infrastructure

The snapshot function allows a backup or mirror program to run concurrently with user updates and still obtain a consistent copy of the file system as of the time that the snapshot was created.

IBM.com/systems/storage/spectrum/

Snapshot capacity

IBM Spectrum Scale V4.1 can retain 256 Global snapshots and 256 Snapshots of each Independent Fileset.

Spectrum Scale 4.1 can have 10,000 dependent filesets and 1,000 independent file sets.

Scalability

Maximum number of files/file system 264

(9quintillion) files per file system

Maximum file system size 299 bytesMaximum number of nodes 16,384

IBM Spectrum Scale is designed to meet the needs of data-intensive applications such as engineering design, digital media, data mining, relational databases, financial analytics, seismic data processing, scientific research and scalable file serving. The solution scales up to more than a billion petabytes of data and hundreds of GB/s throughput.

Flash Local Read Only Cache (LROC)

Clients

Spectrum Scale

Flash LROC SSDs• Inexpensive SSDs or Flash placed directly in Client nodes

• Accelerates I/O performance up to 6x by reducing the amount of time CPUs wait for data

• Also decreases the overall load on the network, benefitting performance across the board

• Improves application performance while maintaining all the manageability benefits of shared storage

• Cache consistency ensured by standard tokens

• Data is protected by checksum and verified on read

• Spectrum Scale handles the flash cache automatically so data is transparently available to your application with very low latency and no code changes

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

LROC Flash Cache Example Speed Up

• Initially, with all data coming from the disk storage system, the client reads data from the SAS disks at ~

5,000 IOPS

• As more data is cached in Flash, client performance increases to 32,000 IOPS while reducing the load

on the disk subsystem by more than 95%

~ 5,000 IOPS 10K RPM SAS Drives

~ 32,000 IOPS Flash SSD

~ 6x

• Two consumer grade 200 GB SSDs cache a forty-eight 300 GB 10K SAS disk storage system

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

IBM FlashSystem

All Files

FlashSystem as Cache

FlashSystem for Metadata StorageFlashSystem as storage tier

Performance: Using IBM Spectrum Scale with FlashSystem

IBM

FlashSystem

HDD Storage

Hot Files

FlashSystem is data center optimizedto deliver extreme performance,

flexible capacity and total system protection

All other files

Data and metadata Data Metadata

Spectrum Scale

cluster:

Primary Storage

Spectrum Scale

cluster:

Primary Storage

IBM FlashSystem

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Collaboration with Active File Management (AFM)•AFM makes global namespace truly

global by automatically managing asynchronous synchronization of data

•Only the modified contents are synchronized from the primary to the remote site

• Local caching: cached data access performs much better than WAN access

• Latencies are improved

• WAN link usage is reduced

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

IBM Spectrum Scale Solutions

3 Deployment Options

Software OnlySoftware licenses: Express, Standard or Advanced Editions

IBM Spectrum Scale SW, GUI, GNR, drives, services

IBM Elastic Storage Server

Managed ServiceIBM high performance services for data

IBM Spectrum Scale

IBM.com/systems/storage/spectrum/

Software Defined Infrastructure

Spectrum Scale + SoftLayer Cloud

Spectrum Scale Parallel Architecture

Software Defined Infrastructure

Clients use data, Network Storage Devices (NSDs) serve shared data

All NSD servers export to all clients in active-active mode

Spectrum Scale stripes files across NSD servers and NSDs in units of file-system block-size

NSD client communicates with all the servers

File-system load spread evenly across all the servers and storage. No HotSpots

Easy to scale file-system capacity and performance while keeping the architecture balanced

NSD Client does real-time parallel I/O to all the NSD servers and storage volumes/NSDs

File stored in blocks

IBM.com/systems/storage/spectrum/

Spectrum Scale Cluster Models

Software Defined Infrastructure

Storage

Storage Storage

TCP/IP or Infinband RDMA Network

Storage Network

TCP/IP or Infiniband Network

TCP/IP or Infinband Network

NSD Servers

ApplicationNodes

ApplicationNodes

IBM.com/systems/storage/spectrum/

Delivers Extreme Data Integrity and Space Efficiency

• 2- and 3-fault-tolerant erasure codes

• Up to 2PB per rack• End-to-end checksum• Protection against lost

writes• Disk Hospital• Proactively, detect,

diagnose and resolve disk issues

Software Defined Infrastructure

Model GL62 servers, 6 Enclosures, 28U, 348 NL-SAS, 2 SSD

2, 4, or 6TB drives12+ GB/sec

Breakthrough Performance

• High performance - less hardware

• De-clustered RAID reduces app load during rebuilds

• Up to 3x lower overhead to applications

• Built-in SSDs and NVRAM for write performance

• Fastest rebuild times using De-clustered RAID

• Graphical disk failure management

Lowers TCO

• 3 Years Maintenance and Support

• General Purpose Servers

• Off-the-shelf SBODs• Standardized in-band

SES management• Standard Linux• Modular Upgrades• Faster than

alternatives today –and tomorrow!

IBM.com/systems/storage/spectrum/

IBM Elastic Storage ServerIBM Spectrum Scale bundled solution

Spectrum Scale Use Cases

Software Defined Infrastructure

Spectrum Scale shared storage

Cinder SwiftHadoop

Connector

NFS

Single software defined storage solution across all these application types

Linear capacity & performance scale out

POSIX

Enterprise class storage using standard hardware

Single Name Space

NAS Big Data & Analytics Cloud

(Block) (Object)

File

SMB/CIFS

IBM.com/systems/storage/spectrum/

2121

NAS Solutions

IBM.com/systems/storage/spectrum/

IBM Spectrum Scale benefits over other NAS solutionsBetter performance Eliminate hotspots with massively parallel access to files

Sequential I/O with ES greater than 400 GB/s

Throughput advantage for parallel streaming workloads, e.g. Tech Computing and

Analytics

More Storage. More Files. Hyper Scale.

Simplified Management Easier management with one global namespace instead of managing islands of

NAS arrays, e.g. no need to copy data between compute clusters

Integrated policy driven automation

Fewer storage administrators required

Lower Cost Optimizes storage tiers including flash, disk and tape

Increased efficiency and more efficient provisioning due to parallelization and

striping technology

Remove duplicate copies of data, e.g. run analytics on one copy of data without

having to set up a separate silo

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Data Access: IBM Spectrum Scale protocol support

• The IBM Spectrum Scale Protocol Node allows access to data stored in a Spectrum Scale filesystem, using additional access methods and protocols.

• The Protocol Node functions are clustered and can support transparent failover for NFS and SWIFT protocols as well as SMB protocols.

• Multiprotocol data access from other systems using the following protocols

• NFS v3 and v4

• SMB 2 and SMB 3.0 mandatory features / CIFS for

Windows support

• SMB support is delivered by Samba 4.2.

• 3,000 active connections per node / 20K max

• OpenStack Swift and S3 API support for object storage.

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

SWIFT

NFS

CIFS

Administrator

Command Line Interface

Users

NFS

SMB/CIFS

POSIX

Open Stack Swift

PN1

ProtocolNode

Flash

Disk

Tape

Exte

rnal

TC

P/IP

or

IB N

etw

ork

PN2

PNn

NSD1

Network Shared Disks

NSD2

NSDn

Physical Storage

Data Access: Protocol Support

IBM

Sp

ectr

um

Sca

le C

lust

er T

CP/

IP o

r IB

Net

wo

rkMgmt Nodes

AuthenticationServices

keystone

Open Stack Cinder

Spec

tru

m S

cale

Clu

ster

No

des

Elastic Storage Server

IBM.com/systems/storage/spectrum/

Software Defined Infrastructure

New GUI coming in October!

25

HadoopIntegration

IBM.com/systems/storage/spectrum/

Spectrum Scale: Drop-in Replacement for HDFSAdding Analytics without adding a dedicated Analytics infrastructure• Hadoop connector• Supports IBM Big Insights Analytics and open

source Apache Hadoop• Existing infrastructure can do Hadoop-based

Analytics• No need to purchase a dedicated Analytics infrastructure, lowering CAPEX and

OPEX

• No need to move data in and out of an Analytics dedicated silo

• Software defined infrastructure for multi-tenancy

• Enterprise-class protection and efficiency‒ Full data lifecycle management‒ Policy based tiering from flash to disk to tape

• Reduce cost, simplify management

Compute Cluster

Spectrum Scale

HDFS

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

IBM Spectrum Scale: Hybrid storage for Hadoop Applications

Shared Storage Pools Shared Nothing Cluster Pool

DiskFlash

Spectrum Scale client

Spectrum Scale Hadoop Connector

Hadoop File System API

Hadoop Application • Exploit locality for the files stored in the local storage

• Access shared storage thru the same connector.

• Storage is completely transparent to the application

• Scale storage independent of compute nodes

• The IBM Spectrum Scale Hadoop connector has been extended to support shared storage that includes SAN Based storage, shared nothing cluster configurations, and integrated solutions like ESS.

• Full Hadoop interfaces for Map/Reduce analytics processing.

• No transfer or ingest required as the data is already there

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

2828

CloudEnablement

IBM.com/systems/storage/spectrum/

• OpenStack Havana release includes a Cinder driver• Giving architects access to the features and capabilities of the industry’s leading enterprise scale-out software

defined storage

• With OpenStack on Spectrum Scale, all nodes see all data • Copying data between services, like Glance to Cinder is minimized

or eliminated

• Speeding instance creation and conserving storage space

• Rich set of data management and information lifecycle features• Efficient file clones

• Policy based automation optimizing data placement for locality or performance tier

• Backup

• Industrial strength reliability, minimizing risk

• Cinder driver provides resilient block storage, minimal data copying between services, speedy instance creation and efficient space utilization

Spectrum Scale OpenStack Cinder Driver

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

IBM Spectrum Scale and OpenStack Swift

• Consolidate File and Object under a single shared storage infrastructure.

• The new IBM Spectrum Scale Protocol Node lets you share the storage infrastructure for both Files and Objects

• Running your object store on IBM Spectrum Scale provides these key features:

• POSIX/NFS/SMB/Object in single storage cluster with a single authentication scheme

• Extra layers of data protection through Snapshots, Backup, and/or Disaster Recovery

• Integrated ILM tiering to move cold objects to low cost tier and off premise

• Encryption of data at rest and Secure Erase

• Additional data protection ESS solution

IBM Spectrum Scale

NFS

SMBPOSIX

SSD Fast

Disk

Slow

DiskTape

Swift

HDFS

Cinder

Glance Manila

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

OpenStack and IBM Spectrum Scale help clients manage data at scaleBusiness Needs IBM Spectrum Scale

Business: I need virtually unlimited storage An open & scalable cloud platform

Operations: I need a flexible infrastructure that supports both object and file based storage

A single data plane that supports Cinder, Glance, Swift, Manila as well as NFS, et. al.

Operations: I need to minimize the time it takes to perform common storage management tasks

A fully automated policy based data placement and migration tool

Collaboration: I need to share data between people, departments and sites with low latency.

Sharing with a variety of WAN caching modes

IBM.com/systems/storage/spectrum/

Software Defined Infrastructure

Data Center and Point of Presence

New Data Centers in 2014

Network Point of Presence

100,000+Servers

21,000Customers

20,000,000Active

Domains

•IPv4/IPv6 dual stack

•Global DNS

•Global DDOS Mitigation

•Global Internet Exchanges & Peering

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Infrastructure solution with a common management interface and API across a unified architecture

Mix and match bare metal servers, virtual server instances, and hosted private clouds

Full integration with all IBM storage portfolio offerings

Full OpenStack, RESTful API, SmartCloud Storage and IBM Storage Integration Server integration

Seamless scaling for Cloud and large deployments. This include Public, Private and Hybrid solutions

Bare metal with your own stack

Dedicated virtualized environment

Shared virtual environment

Dedicated virtualized environment

Triple Network Architecture

Automation & Support

Delivers Outstanding Performance & Price

Flexibility to Deliver Dynamic/ Hybrid Capability

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

ILM and Archiving with LTFSIBM.com/systems/storage/spectrum/

35

What is LTFS?1) Open Format for data which is written to tape

Developed and disclosed by IBM Describes the format of data and meta data stored on tape Meta data is based on XML schema Applicable to LTO5, LTO6 and TS1140

Requires tape partitioning

2) File System support (code) to R/W tapes in LTFS format Externalizes the LTO5 / LTO6 / TS1140 tape as file system

Enables standard applications to write/read LTFS tapes Supports update, edit, and delete of files on LTFS tape

Supports partial recall

Available on Linux, Mac OS X and Windows

• Makes tape look and work like any removable media (e.g., USB drive, removable disk)

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

36

LTFS Mount point is the library

Cartridges are subdirectories

LTFS mounts cartridges into drive to service file access requests

Easy usage, no ISV required

Caching of tape indices in memory

For searching and displaying tape contents without needing a mount

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

Data Ingestion or creation

Data Processing Access Archival

High Performance TierFlash, SSD, SAS

Parallel Access

Provide highest performance for most demanding applications

High volume storage

Single Global Name Space across all tiers

Lower costs by allocating the right tier of storage to the right need

Archival storage with low cost disk or tape

Integration with Spectrum Protect and Spectrum Archive

Policy based Archival and remote Disaster Recovery

Manage the full data life cycle cost-effectively through policy driven IML

Software Defined Infrastructure

IBM.com/systems/storage/spectrum/

The Solution: IBM Spectrum Scale brings it all together

Global Name Space

IBM Spectrum Scale replaces

SAN-based file systems

Replaces NTFS, EXT4, JFS2 and other

POSIX file systems

Used by over 200 of the top 500

supercomputers

No file transfers required between

different OS

Can be used with everything from

databases to video streaming

For x86, POWER and

z System servers

Secure with

Data-at-rest encryption

IBM Spectrum Scale replaces HDFS and NAS file storage

Full Hadoop interfaces for Map/Reduce analytics processing

No transfer or ingest required as the data is already there

Fully protected with Backup Software

File-level access support for NFS, CIFS, FTP, SCP and HTTPS

Supports Enterprise File Sync-and-Share

via OwnCloud or Funambol

IBM Spectrum Scale

offers Object access

Object-level access based on

OpenStack Swift driver and

Amazon S3 APIs

IBM Spectrum Scale

supports all media and

integrates seamlessly

with LTFS

Spans flash, disk and tape

media with a file system view

that

IBM.com/systems/storage/spectrum/

Software Defined Infrastructure

For more information:Websites:

http://www.ibm.com/systems/storage/spectrum/scale/

http://www.ibm.com/cloud-computing/us/en/

Product Pages:

http://www-03.ibm.com/systems/storage/flash/

http://www-03.ibm.com/systems/storage/spectrum/ess/

http://www-03.ibm.com/systems/storage/tape/ltfs/

IBM RedBooks

https://www.redbooks.ibm.com/

Thank you!

Software Defined Infrastructure