© 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM...

42
© 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1 Reference Architecture Gord Sissons Steve Hurley Chris Porter Blane Rockafellow

Transcript of © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM...

Page 1: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

1

IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1 Reference Architecture

Gord SissonsSteve HurleyChris PorterBlane Rockafellow

Page 2: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

2

Agenda

• About the BigInsights 2.1 HW reference architecture

• Solution components

• Key BigInsights Advantages

• Platform Computing Products

• IBM Platform Symphony

• IBM GPFS FPO

• IBM Platform Cluster Manager

Page 3: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

3

The IBM System X BigInsights Reference Architecture

• One of a family of big data reference architectures from IBM

• Enables fast, risk free deployment with validated configurations

• Flexibility to accommodate different client needs

• Value-added software components can be implemented with Lab Services

Pre-Assembled racks Customized to your needs Integrated and tested Supported as a solution Tailored to your needs Start small…and grow Easy to order Easy to manage

Pre-Assembled racks Customized to your needs Integrated and tested Supported as a solution Tailored to your needs Start small…and grow Easy to order Easy to manage

Page 4: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

4

Configuration Starter Half Rack w/ Mgmt Nodes*

Full Rack w/ Mgmt Nodes*

Full Data Node Rack*

Available storage (2TB/3TB) 108TB / 144TB 324TB / 432TB 648TB / 864TB 720TB / 960TB

Raw data space (2TB/3TB) 27TB / 36TB 81TB / 108TB 114TB / 216TB 180TB / 240TB

Mgmt Nodes / Data Nodes 1 Mgmt / 3 Data 3 Mgmt / 9 Data 3 Mgmt / 18 Data 0 Mgmt / 20 Data

Switches 1 x 10GbE / 1 x 1GbE 1 x 10GbE / 1 x 1GbE 1 x 10GbE / 1 x 1GbE 1 x 10GbE / 1 x 1GbE

IBM BigInsights Reference ArchitectureHardware: Incorporating a balance of value, enterprise and performance options

* Number of management nodes required varies with cluster size and workload; for multi-rack configs, select combination of these racks as needed

Management Node

x3550 M4 withTwo E5-2650 2GHz 8-core CPU128GB RAM, 16x 8GB 1600MHz RDIMMFour 600GB 2.5” HDD (OS)Two Dual-port 10GbE (data)Dual-port 1GbE (mgmt)

Data Node

x3630 M4 withTwo E5-2450 2.1GHz 8-core CPU48GB RAM, 6x 8GB 1600MHz RDIMMTwo 3TB 3.5” HDD (OS/app)Twelve 3TB 3.5” HDD (data)Optional 4TB HDD upgradeDual-port 10GbE (data)Dual-port 1GbE (mgmt)

Page 5: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

5

IBM BigInsights Reference ArchitectureSoftware: Your choice of best-of-breed and open-source components

* Optional items should be sold with Lab Services to ensure proper installation and configuration

Optional components

IBM InfoSphere BigInsightsIBM InfoSphere BigInsights

Resource sharing / multi-tenancy

Scheduler

Distributed File system

Hadoop Scheduler / Platform Symphony Scheduler (included)

Hadoop Scheduler / Platform Symphony Scheduler (included)

RHEL, SUSE 64bit LinuxRHEL, SUSE 64bit Linux

2 x Mellanox ConnectX data network1 x dual-port 1 GbE management network

2 x Mellanox ConnectX data network1 x dual-port 1 GbE management network

x3550 M4 master node(s)X3630 M4 compute nodesx3550 M4 master node(s)X3630 M4 compute nodes

Hardware

Network

Operating system

HDFS HDFS GPFS 3.5.0.9(optional)

GPFS 3.5.0.9(optional)

Platform Symphony Advanced Edition(optional)

Platform Symphony Advanced Edition(optional)

Analytics Software Environment

Provisioning andCluster Management

Platform Cluster ManagerPlatform Cluster Manager – AE

(optional)

Platform Cluster ManagerPlatform Cluster Manager – AE

(optional)

GPFS FPOConnector

Page 6: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

6

A Comprehensive solution for big data analytics

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictive Analytics

Content Analytics

Analytic Applications

Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

Data Warehouse

HadoopSystem

Stream Computing

Agile, multi-tenant shared infrastructure

The IBM Big Data Platform

• Comprehensive platform• Data at rest, data in motion• Structured, un-structured, semi-structured• Extensive library of data connectors• Rich development tools• Application accelerators• Web-based management console

Page 7: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

7

Visualization & DiscoveryVisualization & Discovery IntegrationIntegration

Workload OptimizationWorkload OptimizationStreams

Netezza

Flume

DB2

DataStage

IBM InfoSphere BigInsightsIBM InfoSphere BigInsights

Runtime / SchedulerRuntime / Scheduler

Advanced Analytic EnginesAdvanced Analytic Engines

File SystemFile System

MapReduce

HDFS

Data StoreData StoreHBase

Text Processing Engine & Extractor Library)

BigSheetsJDBC

Applications & DevelopmentApplications & Development

Text Analytics MapReduce

Pig & Jaql Hive

AdministrationAdministration

Index

Splittable Text Compression

Enhanced Security

Flexible SchedulerJaql

Pig

ZooKeeper

Lucene

Oozie

Adaptive MapReduce

Hive

Integrated Installer

Admin Console

Sqoop

Adaptive Algorithms

Dashboard & Visualization

Apps

Workflow Monitoring

ManagementManagement

HCatalog

Security

Audit & History

Lineage

R

Guardium

PlatformComputing

Cognos

IBMOpen Source

Symphony

GPFS FPO

Optional

Symphony AE

The IBM Big Data Platform

Page 8: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

8

Complexity - A Key Customer Challenge

Multiple distributed software components, often deployed on separate infrastructure

expensive to deploy, expensive to manage, expensive to evolve

Multiple distributed software components, often deployed on separate infrastructure

expensive to deploy, expensive to manage, expensive to evolve

Page 9: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

9

Cluster sprawl drives cost and inefficiency

Operational challenges are looming• Fast evolving ecosystem• Multiple versions and distributions• Many inter-dependencies• Data management challenges (HDFS)• Application lifecycle management concerns

Operational challenges are looming• Fast evolving ecosystem• Multiple versions and distributions• Many inter-dependencies• Data management challenges (HDFS)• Application lifecycle management concerns

From Mike Gualiteri, Forrester Research

Page 10: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

10

Resource Orchestration

Multi-tenant shared service environment

Provisioning & Management

Enterprise Storage

Workload Manager(s)

A smarter, consolidated infrastructure

Page 11: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

11

IBM PLATFORM SYMPHONYUnderstanding the advantage

Page 12: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

12

IBM Platform Symphony

• A heterogeneous grid management platform

• A high-performance SOA middleware environment

• Supports diverse compute & data intensive applications• Compute and Data intensive ISV analytic applications

• In-house analytic applications (C/C++, C#/.NET, Java, Excel, R etc)

• Optimized low-latency Hadoop compatible run-time

• Can be used to launch, persist and manage non-grid aware application services

• React instantly to time critical-requirements

• Production proven multi-tenancy with resource sharing capabilities

• Embedded single-tenant license in InfoSphere BigInsights 2.1

Page 13: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

13

Symphony brings unique capabilities to Big Data

Performance• Performance advantages for a variety of Map Reduce workloads – Boost productivity and

reduce or avoid cost

Resource sharing*• Share infrastructure among departments and across multiple Hadoop and non-Hadoop

applications to maximize efficiency and reduce cost

Scheduling agility• Proportional, priority-based resource allocation, SLA guarantees, and fast configurable pre-

emption ensures that Symphony can respond instantly to time critical workloads

SLA management*• Removes a major barrier to resource sharing helping organizations evolve to a shared service

model to maximize flexibility and reduce infrastructure costs

Reporting & Analytics*• Optional Platform Analytics add-on enables organizations to monitor granular resource usage

for charge-back accounting and improved capacity planning

Reliability• Ensure reliability of core system services, and make individual Hadoop jobs recoverable to

avoid down-time, and ensure that critical reporting windows and SLAs are met

* IBM Platform Symphony Advanced Edition license required

Page 14: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

14

IBM Platform Symphony

Performance

• Low-latency SOA workload manager• Performance results vary between ~40% and

~10x depending on workload• Audited results1 show an average 7x advantage

on social media workloads with a 50x advantage in raw scheduling performance

• Single tenant2 Symphony license included in BigInsights 2.1 Enterprise Edition

• Many performance enhancements• Push-model for low-latency scheduling• Shuffle-stage optimizations• Use of native APIs for JAR file movement• Generic slots to fully utilize cluster

1-Audited STAC Report available for download - http://www-03.ibm.com/systems/technicalcomputing/platformcomputing/products/symphony/highperfhadoop.html

2-The embedded Symphony licenses entitles a user to run only a single instance of BigInsights. No limits are placed on concurrently executing BI workloads. Customers can purchased Platform Symphony Advanced Edition to support multiple grid consumers (tenants)

Comparative “sleep test” based on methodology to measure scheduling performance discussed at Hadoop World 2011. Compares Hadoop 0.20.2, Hadoop 1.0.1 (with 0.3 second heartbeat) and Hadoop 1.0.1 accelerated by IBM Platform Symphony.

http://www.slideshare.net/cloudera/hadoop-world-2011-hadoop-and-performance-todd-lipcon-yanpei-chen-cloudera

Page 15: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

15

IBM Platform Symphony

Resource sharing

• Share resources among heterogeneous workloads (Hadoop and non-Hadoop)

• Up to 300 concurrent job trackers• Flexible application profiles• Support multiple IBM and third party analytic

applications on a shared infrastructure• InfoSphere Streams, IBM DataStage,

SPSS, SAS, Mathworks MatLab, R etc.

Page 16: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

16

IBM Platform Symphony

Scheduling agility

• Agile scheduling ensures that time critical workloads start and finish fast

• Optionally give priority to interactive jobs (i.e. BigSheets, Big SQL)

• Resource allocations shift instantly based on priority adjustments and proportional allocations at run-time

• Generic slot models ensures that the cluster can be kept 100% busy

Page 17: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

17

IBM Platform Symphony

SLA management

• Guarantee minimum quality of service• Time-variant sharing policies• Multiple resource sharing models• Granular, directed sharing• Configurable pre-emption policies• Maintain multiple versions of application

services to simplify life-cycle management• Share resources between Dev, Test,

Production & QA application instances

Page 18: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

18

IBM Platform Symphony

Reporting and Analytics

• Comprehensive reporting built-in• Monitor resource allocations to tune sharing • Ensure business SLAs are being met• Optional Platform Analytics add-on for OLAP

analysis supporting chargeback accounting and improved capacity planning

Page 19: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

19

IBM Platform Symphony

Reliability

• No single point of failure• All services highly available• Hadoop jobs recoverable in the event of failure• Ensure deadlines and batch-windows are met• Service replay debugger helps rapidly diagnose problems that occur in production at scale

• Production proven at scale

Page 20: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

20

IBM GPFSBringing new capabilities to IBM BigInsights

Page 21: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

21

GPFS – bringing new capabilities to BigInsights

POSIX compliance• Wile HDFS is a single-purpose file system, GPFS implements the POSIX specification natively

meaning that multiple applications can share the same filesystem improving flexibility and avoiding data redundancy

File system reliability• GPFS FPO eliminates the name node as a single point of failure improving file system

reliability and recoverability

Flexible storage configuration• Employ the right storage architecture depending on the application need, using shared nothing

storage with n-way block replication for Hadoop workloads, and traditional GPFS storage for non-Hadoop workloads to improve flexibility and minimize cost

Enterprise features• GPFS FPO and GPFS can co-exist on the same cluster, bringing advanced features to Hadoop

environments including active file management, information lifecycle management and file system snapshots to simplify the management of large storage infrastructure

Support from the source• Avoid the risk of storing critical data on an open-source file system with limited support. IBM

owns the codebase for GPFS and can provide mission critical support

Page 22: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

22

POSIX file system

• Native POSIX file system• Avoid workarounds like FUSE• Avoid needless data movement and replication• Variable block-sizes provide good performance

across diverse types of workloads

A single filesystem for both MapReduce and non-MapReduce applications

Hadoop MapReduce applications Native OS applications

GPFS – bringing new capabilities to BigInsights

Page 23: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

23

File system reliability

• GPFS FPO avoids the need for a central namenode, a common failure point in HDFS

• Avoid long recovery times in the event of name node failure

• Pipelined replication for efficient storage of block replicas in GPFS FPO environment

• Boost performance for meta-data intensive applications where the name-node can emerge as a bottleneck.

HDFSNamenode

SecondaryNamenode

Metadata is striped across GPFS FPO nodes, providing better reliability and avoiding the need for primary and secondary name nodes

IBM BigInsights cluster with GPFS FPO

GPFS – bringing new capabilities to BigInsights

Page 24: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

24

Flexible storage configuration

• GPFS FPO avoids the need for a central namenode with distributed metadata, a common failure point in HDFS environments

• Avoids long recovery times in the event that the namenode fails and metadata needs to be recovered from the secondary name node

• Pipelined replication for efficient storage of block replicas in GPFS FPO environment

GPFS Server GPFS Server

Switched Fabric

Shared nothing storage - GPFS FPO

Shared storage - GPFS

IBM BigInsights cluster with GPFS FPO

GPFS – bringing new capabilities to BigInsights

Page 25: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

25

PERFORMANCE & FLEXIBILITY • Performance and efficiency – Similar to HDFS for MapReduce workloads but with the option to deploy a high-performance parallel, shared file system

IMPROVED DATA SHARING FORBETTER COLLABORATION

• Enable improved collaboration and efficient sharing of data among globally distributed teams

BUSINESS CONTINUITY AND DATA INTEGRITY

• Ensure business continuity and data integrity with more reliable storage and remote data replication

MORE EFFECTIVE MANAGEMENT OF DATA OVER ITS LIFECYCLE

• Support automated, cost-efficient management of data over its life-cycle

AVOID EXPENSIVE DATA SILOS WITH MORE VERSATILE STORAGE

• Avoid expensive data silos with a single storage environment that supports diverse application types

Enterprise features

GPFS – bringing new capabilities to BigInsights

Page 26: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

26

IBM PLATFORM CLUSTER MANAGERUnderstanding the advantage

Page 27: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

27

Platform Cluster Manager

Platform Cluster Manager

Provisioning and management of distributed clusters, including self-service cluster creation and management by multiple user groups

IBM Platform Cluster Manager – Advanced Edition

Cluster & Grid Provisioning and Management

Page 28: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

28

OverviewMultitenant self-service creation, flexing and management of multiple analytics and high performance computing (HPC) clusters

Key Capabilities

•Rapid deployment of heterogeneous analytics and HPC clusters

•Secure multi-tenant environment

•Dynamically grow and shrink clusters

•Provision physical and/or virtual machines

•Automates self-service cluster delivery and administration

•Consolidates infrastructure from multiple clusters enabling analytics and HPC cloud environments

Benefits

•Faster time to full system readiness

•Single interface for integrated management & monitoring

•Reduces time to full user productivity

•Reduces IT costs with dramatic gains in infrastructure utilization

28

Private

Analytics & HPC Cluster Mgmt

Open Scalable Proven

Resource Pools

IBM Platform Cluster Manager – Advanced Edition

Page 29: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

29

HPCConsumer

AnalyticsConsumer

AnalyticsConsumer

HPCConsumer

Compute and storage dense nodes – System X or Power

Virtual Infrastructure IBM GPFS Rack Switch

IBM Platform Cluster Manager

Advanced Edition

Ready-to-run clusters dynamically provisioned as tenants on shared infrastructure

Grid Instance #1 Grid Instance #2

IBM Platform LSF

Life Sciences / EDA / CFD / CAE

Grid Instance #3

3rd Party Schedulers

Life Sciences / EDA / CFD / CAE

Grid Instance #4

IBM Platform Symphony

Open-source Apache Hadoop

IBM Platform Symphony

IBM InfoSphereBigInsights

IBM Platform Cluster Manager – Advanced Edition

Page 30: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

30

• Multiple analytics and HPC clusters• Rapid Provisioning: Get the clusters you

need, in minutes, instead of hours and days• Heterogeneous: Deploy LSF, Symphony,

Grid Engine, PBS, Hadoop, most 3rd party workload managers

• Dynamically grow and shrink clusters• Support expansion and shrinking of clusters

as needed over time.• Based on policy, calendar and user

intervention• Share resources between clusters

• Multitenant• Account separation, different service

catalogs, resource limits, per account reporting

• Dynamic VLAN creation• Authenticated access to portal, service

catalog, provisioned machines & storage

• Physical, virtual and hybrid clusters• Choose the right resource to match the

workload• Bare metal provisioning• Switch management• GUI for multiple xCAT instances

• Self-service delivery and administration• Cluster are available on-demand when they

are needed• Reduce/eliminate the need to wait for

someone to act

• Consolidate• Breaks down silos and provides a larger

resource pool

IBM Platform Cluster Manager – Main Capabilities

Page 31: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

31

Platform Cluster Manager Cockpit view Manage physical hosts, virtual machines, clusters, tenant accounts, networks, storage and more

Page 32: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

32

Design clusters for self-service with arbitrarily complex machine elements and software stacks complete with customizable pre and post-provisioning scripts

Page 33: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

33

Automatically deploy ready-to-use analytic environments - InfoSphere BigInsights, Streams, DataStage, Platform Symphony, GPFS, Platform LSF or other analytic software environments

Page 34: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

34

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictive Analytics

Content Analytics

Analytic Applications

Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

Data Warehouse

HadoopSystem

Stream Computing

Agile, multi-tenant shared infrastructure

IBM InfoSphere BigInsights, Platform ComputingExtending the capabilities of IBM BigInsights

Platform Symphony and GPFS provide significant advantages

Improved performance More efficient use of infrastructure Diverse, concurrent workloads Dynamic resource allocation Fast workload pre-emption Sophisticated multi-tenancy Ease of management Guaranteed service levels

Page 35: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

35

BigInsights, Platform Symphony & GPFSProviding competitive advantage for Big Data infrastructure

Capability Cloudera CDH

EMC / GP UAP

MAPR HortonWorks Open Source

BigInsights Platform,

GPFS FPO

Low-latency scheduling

Impala only No Some features

No No

Heterogeneous workloads

No No No No No

Fast pre-emptive scheduling

No No No No No

Time-variant SLA guarantees

No No Some features

No No

Usage Accounting & Analytics add-on

No No No No No

Recoverable Hadoop jobs

No No No No No

POSIX file system NoNFS only

No No

Enterprise file system features

No No No

Page 36: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

36

BigInsights, Platform Symphony & GPFS

Capability Cloudera CDH

EMC / GP UAP

MAPR HortonWorks Open Source

BigInsights Platform,

GPFS FPO

SQL Support

Impala

Pivotal

Drill

Via open source only

Impala, Drill

BigSheets No No No No No

External Data Connectors

GP DB built-in

No No No

Accelerators No No No No No

Complete HW & Software solution

Through HW partners No No No

Single vendor support Through HW partners No No No

Full-featured private cloud management

No No No No No

Providing competitive advantage for Big Data infrastructure

Page 37: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

37

Summing up

IBM BigInsights, Platform Computing, GPFS FPO

• Single-tenant license for Platform Symphony included in BI 2.1

• Upgrade to Symphony Advance Edition for resource sharing features

• Enterprise-class POSIX file system

• Advanced cluster provisioning, private cloud management

• The most complete infrastructure solution for Big Data analytics

Page 38: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

38

Page 39: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

39

ADDITIONAL SLIDES

Page 40: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

40

Other Grid Server

Broker Engines

Each engine polls broker~5 times per second (configurable)

Send work whenengine ready

Client

Serialize input data

Network transport(client to broker) Wait for engine to poll broker

Network transport(broker to engine)

De-serializeInput data

ComputeResult

Serializeresult

Post result back to broker

Time

BrokerCompute time

IBM Platform Symphony is (much) faster because:

Efficient C language routines use CDR (common data representation) and IOCP rather than slow, heavy-weight XML data encoding)

Network transit time is reduced by avoiding text based HTTP protocol and encoding data in more compact CDR binary format

Processing time for all Symphony services is reduced by using a native HPC C/C++ implementation for system services rather than Java

Platform Symphony has a more efficient “push model” that avoids entirely the architectural problems with polling

Platform Symphony

Serializeinput

Networktransport

SSM Computetime & logging

Time

Network transport(SSM to engine)

De-serialize

Serialize

Network transport(engine to SSM)

Compute result

No wait time due to polling, fasterserialization/de-serialization,More network efficient protocol

Being more efficient means getting more work done with fewer resources

Latency matters in Big Data Analytics

Page 41: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

41

7.5x Faster

Benchmark: SWIM: Facebook 2010 Workload

Page 42: © 2013 IBM Corporation Platform Computing 1 IBM BigInsights 2.1 Understanding the role of IBM Platform Computing and GPFS FPO in the STG BigInsights 2.1.

© 2013 IBM Corporation

Platform Computing

42

Understanding the advantage

Symphony 6.1 can schedule ~50x more tasks per second

Hadoop results taken from Hadoop World 2011 performance presentation, Lipcon & Chen

Hadoop 1.1.1