Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

30
Introducing IBM’s InfoSphere BigInsights Cynthia M. Saracco Senior Solution Architect IBM Silicon Valley Lab <

description

Technical introduction to IBM&#x27;s InfoSphere BigInsights platform for managing and analyzing Big Data. Updated July 2014 for BigInsights 3.0.

Transcript of Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

Page 1: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

Introducing IBM’s InfoSphere BigInsights

Cynthia M. Saracco

Senior Solution Architect

IBM Silicon Valley Lab

<

Page 2: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

2 © 2013 IBM Corporation

IBM Big Data Platform Strategy

BI /

Reporting

BI /

Reporting

Exploration /

Visualization

Industry

App

Predictive

AnalyticsContent

Analytics

Analytic Applications

IBM Big Data Platform

Systems

Management

Application

Development

Visualization

& Discovery

Accelerators

Information Integration & Governance

Hadoop

System

Stream

Computing

Data

Warehouse

• Integrate and manage the

full range of Big Data

• Apply advanced analytics

• Explore and visualize data

for ad hoc analysis

• Speed development of

new analytic applications

• Provide high levels of

performance and

scalability

• Integrate with enterprise

software

. . . .

Page 3: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

3 © 2013 IBM Corporation

BigInsights Brings Hadoop to the Enterprise

� BigInsights = analytical platform for

persistent Big Data– Based on open source & IBM technologies

– Deep customer engagements, product plan

flexibility

� Distinguishing characteristics– Built-in analytics . . . . Enhances business

knowledge

– Enterprise software integration . . . .

Complements and extends existing

capabilities

– Production-ready platform with tooling for

analysts, developers, and administrators. . . .

Speeds time-to-value; simplifies

development and maintenance

� IBM advantage– Combination of software, hardware, services

and advanced research

Page 4: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

4 © 2013 IBM Corporation© 2013 IBM Corporation4

From Getting Starting to Enterprise Deployment:

Different BigInsights Editions For Varying Needs

Standard Edition

Breadth of capabilities

Enterprise class

Enterprise Edition

- Spreadsheet-style tool

-- Web console

-- Dashboards

- Pre-built applications

-- Eclipse tooling

-- RDBMS connectivity

-- Big SQL

-- Monitoring and alerts

-- Platform enhancements

-- . . .

- Accelerators

-- GPFS – FPO

-- Adaptive MapReduce

- Text analytics

- Enterprise Integration

-- Big R

-- InfoSphere Streams*

-- Watson Explorer*

-- Cognos BI*

-- Data Click*

-- . . .

-* Limited use license

ApacheHadoop

Quick Start Free. Non-production

Same features as Standard Edition plus text analytics and Big R

Page 5: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

5 © 2013 IBM Corporation

BigInsights Content

Function Version

Open

Source

Enterprise

Edition

Integrated Install Inc Inc

Hadoop (including common utilities, HDFS, MapReduce v1) 2.2 Inc Inc

Pig (programming / query language) 0.12.0 Inc Inc

Flume (data collection/aggregation) 1.3.1 Inc Inc

Hive (data summarization/querying) 0.12.0 Inc Inc

Lucene (text search) 4.7.0 Inc Inc

Solr (enterprise search based on Lucene) 4.7.0 Inc Inc

Zookeeper (process coordination) 3.4.5 Inc Inc

Avro (data serialization) 1.7.4 Inc Inc

HBase (real time read/write) 0.96.0 Inc Inc

Sqoop (RDBMS bulk data transfer) 1.4.3 Inc Inc

Page 6: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

6 © 2013 IBM Corporation

BigInsights Content (cont’d)

Function

Open

Source

Enterprise

Edition

Big SQL (standard SQL query support, JDBC/ODBC drivers, LOAD from

RDBMSs, etc.) n/a Inc

Integration with Netezza, DB2 LUW with DPF from Jaql. n/a Inc

Big R (support for Project R statistics and visualization) n/a Inc

LDAP authentication, Kerberos authentication, Guardium support, etc. n/a Inc

Web console with admin facilities, application catalog, etc. n/a Inc

Business process accelerators (social data, machine data analytics) n/a Inc

Platform enhancements (GPFS-FPO, Adaptive MapReduce, efficient

processing of compressed text files, flexible job scheduler, high

availability, monitoring and alerts, etc.)

n/a Inc

Text analytics n/a Inc

Eclipse tools for text analytic development, Jaql, Hive, Java, Big SQL, I n/a Inc

Applications for data import/export, social media, ad hoc query, etc. n/a Inc

Spreadsheet-like analytical tool n/a Inc

IBM support n/a Inc

Streams, Watson Explorer, Data Click, Cognos BI (limited use licenses) n/a Inc

Unlimited storage n/a Inc

Page 7: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

7 © 2013 IBM Corporation

A Closer Look at BigInsights . . . .

Page 8: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

8 © 2013 IBM Corporation

Web Installation Tool

� Seamless process for single node

and cluster environments

� Integrated installation of all

selected components

� Post-install validation of IBM and

open source components

� Get up and running quickly!

No need to iteratively download,

configure, and test multiple open

source projects and pre-requisite

software.

Page 9: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

9 © 2013 IBM Corporation

Integrated Web Console

� Manage BigInsights – Inspect /monitor system health

– Add / drop nodes

– Start / stop services

– Run / monitor jobs (applications)

– Explore / modify file system

– Create custom dashboards

– . . .

� Launch applications – Spreadsheet-like analysis tool

– Pre-built applications (IBM

supplied or user developed)

� Publish applications

� Monitor cluster, applications,

data, etc.

Page 10: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

10 © 2013 IBM Corporation

Spreadsheet-style Analysis

� Web-based analysis and

visualization

� Spreadsheet-like

interface – Define and manage long

running data collection

jobs

– Analyze content of the

text on the pages that

have been retrieved

Page 11: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

11 © 2013 IBM Corporation

Big Data Application Ecosystem

Eclipse

App library

MapReduce, I

Text Analytics

Query

App Development

• Code application program, and generate

associated App

• Deploy Apps to Enterprise ManagerApp

Development

Publish

Data integration scenario:

Pre-defined work flows simplify

loading data from various

sources

•Work flows can be configured,

deployed, executed and

scheduled

Development tooling:

•Text analytics

•MapReduce

•Query languages

• . . .

Application scenarios (web log,

email, social media, '):

• Samples provide starting

point, speed time to value

Big Data Web Console

Page 12: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

12 © 2013 IBM Corporation

Pre-built Applications

� 20+ software samples based on common customer needs

– Useful for starting point for various applications

– Accessible through Web console

� Available assets

– Data movement

• From relational DBMS, files, REST-based sources

• To relational DBMS, files

– Web crawler, social media data collectors, etc.

– Ad hoc query

– Monitoring

– Data sampling and subsetting

– TeraGen-TeraSort, WordCount sample applications

Page 13: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

13 © 2013 IBM Corporation

Running Applications from the Web Console

Page 14: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

14 © 2013 IBM Corporation

Chaining Applications (Drag-and-Drop)

Page 15: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

15 © 2013 IBM Corporation

Building a Big Data program – Big SQL example

BigInsights plug-in

Java MapReduce, Big SQL, Jaql,

Hive, Pig, text analytics, etc.

Page 16: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

16 © 2013 IBM Corporation

Visualizing Results through Dashboards

• Built-in dashboards for monitoring system health, application status, distributed file system, etc.

• Easy to customize . . . . Add, group, or remove widgets for:

• BigSheets collections and charts

• Cluster/system Monitoring

• HDFS monitoring

• MapReduce metrics

• Third party Widgets or Open Social Gadgets can be added to a dashboard

• Create new, custom dashboards to suit your needs!

Page 17: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

17 © 2013 IBM Corporation

Big SQL 3.0

11-Apr-2014

Page 18: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

18 © 2013 IBM Corporation

BigInsights and Text Analytics

• Distills structured info from unstructured text

– Sentiment analysis

– Consumer behavior

– Illegal or suspicious activities

– I

• Parses text and detects meaning with annotators

• Understands the context in which the text is analyzed

• Features pre-built extractors for names, addresses, phone

numbers, etc.

• Built-in support for English,

Spanish, French, German,

Portuguese, Dutch, Japanese,

Chinese

Football World Cup 2010, one team

distinguished themselves well, losing to the

eventual champions 1-0 in the Final. Early in

the second half, Netherlands’ striker, Arjen

Robben, had a breakaway, but the keeper for

Spain, Iker Casillas made the save. Winger

Andres Iniesta scored for Spain for the win.

Unstructured text (document, email, etc)

Classification and Insight

Page 19: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

19 © 2013 IBM Corporation

Text Analytics Lifecycle

Page 20: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

20 © 2013 IBM Corporation

Big R

R Clients

Scalable Statistics Engine

Data Sources

Embedded R Execution

R Packages

R Packages

1

2

3

1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm

2. Scale out R

• Partitioning of large data (“divide”)

• Parallel cluster execution of pushed down R code (“conquer”)

• All of this from within the R environment (Jaql, Map/Reduce are hidden from you

• Almost any R package can run in this environment

3. Scalable machine learning

• A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R

“End-to-end integration of R into IBM BigInsights”

Pull data

(summaries) to

R client

Or, push R

functions

right on the

data

Page 21: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

21 © 2013 IBM Corporation

IBM Accelerator for Telco Event Data Analytics

• Telcos

• Campaign management, real-time promotion, fraud detection, service assurance and network monitoring,

• Ships with Streams v3, but works with BigInsights or PureSparta for Analytics (a.k.a. Netezza)

IBM Accelerator for Social Data Analytics

• B2C businesses

• Sample applications: Customer acquisition / retention, Customer

Segmentation or Micro Segmentation, Marketing Campaign Optimization,

Lead generation, Brand Management or Surveillance

• Ships with BigInsights v2 and Streams v3

IBM Accelerator for Machine Data Analytics

• Cross-industry: manufacturing, oil & gas, energy and utility, healthcare, travel and transportation, CPG, Retail, etc.

• Operational efficiency monitoring, security incident investigation. proactive maintenance, troubleshooting, outage prevention, efficiency tracking, etc

• Ships with BigInsights v2

Application AcceleratorsQuickly build, deploy custom applications in high-value areas

Page 22: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

22 © 2013 IBM Corporation

Adaptive MapReduce (Platform Symphony) option

Other Grid Server

Broker Engines

Each engine polls broker

~5 times per second (configurable)

Send work when

engine ready

Client

Serialize

input data

Network transport

(client to broker) Wait for engine to poll brokerNetwork transport

(broker to engine)

De-serialize

Input data

Compute

Result

Serialize

result

Post result back

to broker

Time

I

Broker

Compute time

Platform Symphony advantages:

Efficient C language routines use CDR (common data representation)

and IOCP rather than slow, heavy-weight XML data encoding)

Network transit time is reduced by avoiding text based HTTP

protocol and encoding data in more compact CDR binary format

Processing time for all Symphony services is reduced by using a native

HPC C/C++ implementation for system services rather than Java

Platform Symphony has a more efficient “push model” that

avoids entirely the architectural problems with polling

Platform Symphony

Serialize

input

Network

transport

SSM Compute

time & logging

Time

Network transport

(SSM to engine)

De-serialize

I

Serialize

Network transport

(engine to SSM)

Compute result

No wait time due to polling, faster

serialization/de-serialization,

More network efficient protocol

Page 23: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

23 © 2013 IBM Corporation

2

3

GPFS – FPO

• File system alternative to

HDFS. Optional.

• Key features

• No single point of

failure

• Built-in High

Availability

• POSIX compliance

• Enhanced Security

with ACL support

• Support for Storage

Pools

• SnapShot capability

23

Page 24: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

24 © 2013 IBM Corporation

• Broad connectivity

Traditional and big data sources

• Simple end-to-end experience

•Web-based configuration

InfoSphere Data Click self-service data integration on-demand

24

Page 25: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

25 © 2013 IBM Corporation

Growing Ecosystem of Solutions

IBM Solutions Partner Solutions

. . . with more to comePlatform Symphony

Cognos Consumer Insight

Page 26: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

26 © 2013 IBM Corporation

BigInsights

Data warehouse

Traditional

analytic

toolsBig Data

analytic

applications

Filter Transform Aggregate

BigInsights and the data warehouse

Page 27: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

27 © 2013 IBM Corporation

BigInsights and the data warehouse

BigInsights

• Query-ready platform for “cold” warehouse dataData Warehouse

Big Data

analytic

applications

Traditional

analytic

tools

Page 28: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

28 © 2013 IBM Corporation

BigInsights: Value Beyond Open Source

Enterprise Capabilities

Administration & Security

Workload Optimization

Connectors

Open source components

Advanced Engines

Visualization & Exploration

Development Tools

IBM-certified

Apache Hadoop and

related projects

Key differentiators • Built-in text analytics

• Enterprise software integration

• SQL support

• Spreadsheet-style analysis

• Integrated installation of supported open source

and other components

• Web Console for admin and application access

• Platform enrichment: additional security,

performance features, GPFS (alternative file

system), . . .

• World-class support

• Full open source compatibility

Business benefits • Quicker time-to-value due to IBM technology

and support

• Reduced operational risk

• Enhanced business knowledge with flexible

analytical platform

• Leverages and complements existing software

Page 29: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

29 © 2013 IBM Corporation

Want to learn more?

� Download Quick Start Edition

� Test drive the technologies– Follow online tutorials

– Enroll in online classes

– Watch video demos, read articles, etc.

� Links all available from HadoopDev – https://developer.ibm.com/hadoop/

Page 30: Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

IBM big data • IBM big data • IBM big data

IBM big data • IBM big data • IBM big data

IBM

big

data

IBM

big

data

IBM

big

data

•IB

M b

ig d

ata