Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform
-
Upload
cynthia-saracco -
Category
Technology
-
view
3.388 -
download
0
description
Transcript of Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform
Introducing IBM’s InfoSphere BigInsights
Cynthia M. Saracco
Senior Solution Architect
IBM Silicon Valley Lab
<
2 © 2013 IBM Corporation
IBM Big Data Platform Strategy
BI /
Reporting
BI /
Reporting
Exploration /
Visualization
Industry
App
Predictive
AnalyticsContent
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
• Integrate and manage the
full range of Big Data
• Apply advanced analytics
• Explore and visualize data
for ad hoc analysis
• Speed development of
new analytic applications
• Provide high levels of
performance and
scalability
• Integrate with enterprise
software
. . . .
3 © 2013 IBM Corporation
BigInsights Brings Hadoop to the Enterprise
� BigInsights = analytical platform for
persistent Big Data– Based on open source & IBM technologies
– Deep customer engagements, product plan
flexibility
� Distinguishing characteristics– Built-in analytics . . . . Enhances business
knowledge
– Enterprise software integration . . . .
Complements and extends existing
capabilities
– Production-ready platform with tooling for
analysts, developers, and administrators. . . .
Speeds time-to-value; simplifies
development and maintenance
� IBM advantage– Combination of software, hardware, services
and advanced research
4 © 2013 IBM Corporation© 2013 IBM Corporation4
From Getting Starting to Enterprise Deployment:
Different BigInsights Editions For Varying Needs
Standard Edition
Breadth of capabilities
Enterprise class
Enterprise Edition
- Spreadsheet-style tool
-- Web console
-- Dashboards
- Pre-built applications
-- Eclipse tooling
-- RDBMS connectivity
-- Big SQL
-- Monitoring and alerts
-- Platform enhancements
-- . . .
- Accelerators
-- GPFS – FPO
-- Adaptive MapReduce
- Text analytics
- Enterprise Integration
-- Big R
-- InfoSphere Streams*
-- Watson Explorer*
-- Cognos BI*
-- Data Click*
-- . . .
-* Limited use license
ApacheHadoop
Quick Start Free. Non-production
Same features as Standard Edition plus text analytics and Big R
5 © 2013 IBM Corporation
BigInsights Content
Function Version
Open
Source
Enterprise
Edition
Integrated Install Inc Inc
Hadoop (including common utilities, HDFS, MapReduce v1) 2.2 Inc Inc
Pig (programming / query language) 0.12.0 Inc Inc
Flume (data collection/aggregation) 1.3.1 Inc Inc
Hive (data summarization/querying) 0.12.0 Inc Inc
Lucene (text search) 4.7.0 Inc Inc
Solr (enterprise search based on Lucene) 4.7.0 Inc Inc
Zookeeper (process coordination) 3.4.5 Inc Inc
Avro (data serialization) 1.7.4 Inc Inc
HBase (real time read/write) 0.96.0 Inc Inc
Sqoop (RDBMS bulk data transfer) 1.4.3 Inc Inc
6 © 2013 IBM Corporation
BigInsights Content (cont’d)
Function
Open
Source
Enterprise
Edition
Big SQL (standard SQL query support, JDBC/ODBC drivers, LOAD from
RDBMSs, etc.) n/a Inc
Integration with Netezza, DB2 LUW with DPF from Jaql. n/a Inc
Big R (support for Project R statistics and visualization) n/a Inc
LDAP authentication, Kerberos authentication, Guardium support, etc. n/a Inc
Web console with admin facilities, application catalog, etc. n/a Inc
Business process accelerators (social data, machine data analytics) n/a Inc
Platform enhancements (GPFS-FPO, Adaptive MapReduce, efficient
processing of compressed text files, flexible job scheduler, high
availability, monitoring and alerts, etc.)
n/a Inc
Text analytics n/a Inc
Eclipse tools for text analytic development, Jaql, Hive, Java, Big SQL, I n/a Inc
Applications for data import/export, social media, ad hoc query, etc. n/a Inc
Spreadsheet-like analytical tool n/a Inc
IBM support n/a Inc
Streams, Watson Explorer, Data Click, Cognos BI (limited use licenses) n/a Inc
Unlimited storage n/a Inc
7 © 2013 IBM Corporation
A Closer Look at BigInsights . . . .
8 © 2013 IBM Corporation
Web Installation Tool
� Seamless process for single node
and cluster environments
� Integrated installation of all
selected components
� Post-install validation of IBM and
open source components
� Get up and running quickly!
No need to iteratively download,
configure, and test multiple open
source projects and pre-requisite
software.
9 © 2013 IBM Corporation
Integrated Web Console
� Manage BigInsights – Inspect /monitor system health
– Add / drop nodes
– Start / stop services
– Run / monitor jobs (applications)
– Explore / modify file system
– Create custom dashboards
– . . .
� Launch applications – Spreadsheet-like analysis tool
– Pre-built applications (IBM
supplied or user developed)
� Publish applications
� Monitor cluster, applications,
data, etc.
10 © 2013 IBM Corporation
Spreadsheet-style Analysis
� Web-based analysis and
visualization
� Spreadsheet-like
interface – Define and manage long
running data collection
jobs
– Analyze content of the
text on the pages that
have been retrieved
11 © 2013 IBM Corporation
Big Data Application Ecosystem
Eclipse
App library
MapReduce, I
Text Analytics
Query
App Development
• Code application program, and generate
associated App
• Deploy Apps to Enterprise ManagerApp
Development
Publish
Data integration scenario:
Pre-defined work flows simplify
loading data from various
sources
•Work flows can be configured,
deployed, executed and
scheduled
Development tooling:
•Text analytics
•MapReduce
•Query languages
• . . .
Application scenarios (web log,
email, social media, '):
• Samples provide starting
point, speed time to value
Big Data Web Console
12 © 2013 IBM Corporation
Pre-built Applications
� 20+ software samples based on common customer needs
– Useful for starting point for various applications
– Accessible through Web console
� Available assets
– Data movement
• From relational DBMS, files, REST-based sources
• To relational DBMS, files
– Web crawler, social media data collectors, etc.
– Ad hoc query
– Monitoring
– Data sampling and subsetting
– TeraGen-TeraSort, WordCount sample applications
13 © 2013 IBM Corporation
Running Applications from the Web Console
14 © 2013 IBM Corporation
Chaining Applications (Drag-and-Drop)
15 © 2013 IBM Corporation
Building a Big Data program – Big SQL example
BigInsights plug-in
Java MapReduce, Big SQL, Jaql,
Hive, Pig, text analytics, etc.
16 © 2013 IBM Corporation
Visualizing Results through Dashboards
• Built-in dashboards for monitoring system health, application status, distributed file system, etc.
• Easy to customize . . . . Add, group, or remove widgets for:
• BigSheets collections and charts
• Cluster/system Monitoring
• HDFS monitoring
• MapReduce metrics
• Third party Widgets or Open Social Gadgets can be added to a dashboard
• Create new, custom dashboards to suit your needs!
17 © 2013 IBM Corporation
Big SQL 3.0
11-Apr-2014
18 © 2013 IBM Corporation
BigInsights and Text Analytics
• Distills structured info from unstructured text
– Sentiment analysis
– Consumer behavior
– Illegal or suspicious activities
– I
• Parses text and detects meaning with annotators
• Understands the context in which the text is analyzed
• Features pre-built extractors for names, addresses, phone
numbers, etc.
• Built-in support for English,
Spanish, French, German,
Portuguese, Dutch, Japanese,
Chinese
Football World Cup 2010, one team
distinguished themselves well, losing to the
eventual champions 1-0 in the Final. Early in
the second half, Netherlands’ striker, Arjen
Robben, had a breakaway, but the keeper for
Spain, Iker Casillas made the save. Winger
Andres Iniesta scored for Spain for the win.
Unstructured text (document, email, etc)
Classification and Insight
19 © 2013 IBM Corporation
Text Analytics Lifecycle
20 © 2013 IBM Corporation
Big R
R Clients
Scalable Statistics Engine
Data Sources
Embedded R Execution
R Packages
R Packages
1
2
3
1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm
2. Scale out R
• Partitioning of large data (“divide”)
• Parallel cluster execution of pushed down R code (“conquer”)
• All of this from within the R environment (Jaql, Map/Reduce are hidden from you
• Almost any R package can run in this environment
3. Scalable machine learning
• A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R
“End-to-end integration of R into IBM BigInsights”
Pull data
(summaries) to
R client
Or, push R
functions
right on the
data
21 © 2013 IBM Corporation
IBM Accelerator for Telco Event Data Analytics
• Telcos
• Campaign management, real-time promotion, fraud detection, service assurance and network monitoring,
• Ships with Streams v3, but works with BigInsights or PureSparta for Analytics (a.k.a. Netezza)
IBM Accelerator for Social Data Analytics
• B2C businesses
• Sample applications: Customer acquisition / retention, Customer
Segmentation or Micro Segmentation, Marketing Campaign Optimization,
Lead generation, Brand Management or Surveillance
• Ships with BigInsights v2 and Streams v3
IBM Accelerator for Machine Data Analytics
• Cross-industry: manufacturing, oil & gas, energy and utility, healthcare, travel and transportation, CPG, Retail, etc.
• Operational efficiency monitoring, security incident investigation. proactive maintenance, troubleshooting, outage prevention, efficiency tracking, etc
• Ships with BigInsights v2
Application AcceleratorsQuickly build, deploy custom applications in high-value areas
22 © 2013 IBM Corporation
Adaptive MapReduce (Platform Symphony) option
Other Grid Server
Broker Engines
Each engine polls broker
~5 times per second (configurable)
Send work when
engine ready
Client
Serialize
input data
Network transport
(client to broker) Wait for engine to poll brokerNetwork transport
(broker to engine)
De-serialize
Input data
Compute
Result
Serialize
result
Post result back
to broker
Time
I
Broker
Compute time
Platform Symphony advantages:
Efficient C language routines use CDR (common data representation)
and IOCP rather than slow, heavy-weight XML data encoding)
Network transit time is reduced by avoiding text based HTTP
protocol and encoding data in more compact CDR binary format
Processing time for all Symphony services is reduced by using a native
HPC C/C++ implementation for system services rather than Java
Platform Symphony has a more efficient “push model” that
avoids entirely the architectural problems with polling
Platform Symphony
Serialize
input
Network
transport
SSM Compute
time & logging
Time
Network transport
(SSM to engine)
De-serialize
I
Serialize
Network transport
(engine to SSM)
Compute result
No wait time due to polling, faster
serialization/de-serialization,
More network efficient protocol
23 © 2013 IBM Corporation
2
3
GPFS – FPO
• File system alternative to
HDFS. Optional.
• Key features
• No single point of
failure
• Built-in High
Availability
• POSIX compliance
• Enhanced Security
with ACL support
• Support for Storage
Pools
• SnapShot capability
23
24 © 2013 IBM Corporation
• Broad connectivity
Traditional and big data sources
• Simple end-to-end experience
•Web-based configuration
InfoSphere Data Click self-service data integration on-demand
24
25 © 2013 IBM Corporation
Growing Ecosystem of Solutions
IBM Solutions Partner Solutions
. . . with more to comePlatform Symphony
Cognos Consumer Insight
26 © 2013 IBM Corporation
BigInsights
Data warehouse
Traditional
analytic
toolsBig Data
analytic
applications
Filter Transform Aggregate
BigInsights and the data warehouse
27 © 2013 IBM Corporation
BigInsights and the data warehouse
BigInsights
• Query-ready platform for “cold” warehouse dataData Warehouse
Big Data
analytic
applications
Traditional
analytic
tools
28 © 2013 IBM Corporation
BigInsights: Value Beyond Open Source
Enterprise Capabilities
Administration & Security
Workload Optimization
Connectors
Open source components
Advanced Engines
Visualization & Exploration
Development Tools
IBM-certified
Apache Hadoop and
related projects
Key differentiators • Built-in text analytics
• Enterprise software integration
• SQL support
• Spreadsheet-style analysis
• Integrated installation of supported open source
and other components
• Web Console for admin and application access
• Platform enrichment: additional security,
performance features, GPFS (alternative file
system), . . .
• World-class support
• Full open source compatibility
Business benefits • Quicker time-to-value due to IBM technology
and support
• Reduced operational risk
• Enhanced business knowledge with flexible
analytical platform
• Leverages and complements existing software
29 © 2013 IBM Corporation
Want to learn more?
� Download Quick Start Edition
� Test drive the technologies– Follow online tutorials
– Enroll in online classes
– Watch video demos, read articles, etc.
� Links all available from HadoopDev – https://developer.ibm.com/hadoop/
IBM big data • IBM big data • IBM big data
IBM big data • IBM big data • IBM big data
IBM
big
data
•
IBM
big
data
IBM
big
data
•IB
M b
ig d
ata