Big Data and Hadoop: Lab at Innovate 2014

7
Big Data Analytics Apache Hadoop and InfoSphere BigInsights Apache Hadoop and InfoSphere BigInsights June 5, 2014 C. M. Saracco and Nicolas Morales IBM Silicon Valley Lab

description

big data, hadoop, biginsights, big sql, lab, innovate, conference, sql, analysis, bigsheets

Transcript of Big Data and Hadoop: Lab at Innovate 2014

Page 1: Big Data and Hadoop:  Lab at Innovate 2014

Big Data Analytics Apache Hadoop and InfoSphere BigInsightsApache Hadoop and InfoSphere BigInsights

June 5, 2014

C. M. Saracco and Nicolas Morales

IBM Silicon Valley Lab

Page 2: Big Data and Hadoop:  Lab at Innovate 2014

2020

35 zettabytesas much Data and Content

Over Coming Decade

44x Business leaders frequently make decisions based on information they don’t trust, or don’t have1 in3

Business leaders say they don’t have access to the information they need to do their jobs

1 in2

, And Organizations

Need Deeper Insights

Information is at the Center

of a New Wave of Opportunity…

2009

800,000 petabytes

83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness

need to do their jobs1 in2

of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions

60%Of world’s data

is unstructured

80%

Page 3: Big Data and Hadoop:  Lab at Innovate 2014

Big Data use study

2012 Big Data @ Work Study surveying 1144 business and IT professionals in 95

countries

Page 4: Big Data and Hadoop:  Lab at Innovate 2014

About this lab

� Application scenarios

– Explore coverage of a popular brand (IBM Watson) in social media

– Offload “cold” data warehouse data and query/analyze it

� Technologies

– Apache Hadoop and complementary open source offerings (InfoSphere

BigInsights Quick Start Edition – free public download)BigInsights Quick Start Edition – free public download)

– Web console

– BigSheets (spreadsheet-style interface)

– Big SQL

– Eclipse plug-in

– . . .

4

Page 5: Big Data and Hadoop:  Lab at Innovate 2014

BigInsights Enterprise Edition

Text processing engine and library

Infrastructure

Optional

IBM and

partner

offerings

Analytics and discovery “Apps”

BigSheets

Web Crawler

Distrib file copy

DB export

Boardreader

DB import

Ad hoc query

Machine learning

Data processing

. . .

Administrative

and development

toolsWeb console

• Monitor cluster health, jobs, etc.

• Add / remove nodes

• Start / stop services

• Inspect job status

• Inspect workflow status

Big R

Open Source IBM IBM

Accelerator for machine data analysis

Accelerator for social data analysis

Connectivity and Integration

StreamsNetezzaJDBC

Flume

Infrastructure Jaql

Hive

Pig

HBase

MapReduce

HDFS

ZooKeeperIndexing Lucene

Adaptive MapReduce

Oozie

Text compression

Enhanced security

Flexible scheduler

DB2

• Inspect workflow status

• Deploy applications

• Launch apps / jobs

• Work with distrib file system

•Work with spreadsheet interface

•Support REST-based API

• Create / view alerts

• . . .

Eclipse tools

• Text analytics

• MapReduce programming

• Jaql, Hive, Pig development

• BigSheets plug-in development

• Oozie workflow generation

Integrated installer

Cognos BI

Big SQL

Guardium DataStageData Explorer

Sqoop

HCatalogGPFS –FPOGPFS –FPO

Used in today’s lab

Page 6: Big Data and Hadoop:  Lab at Innovate 2014

Want to learn more? � Download Quick Start Edition

� Test drive the technologies– Follow online tutorials

– Enroll in online classes

– Watch video demos, read articles, etc.

� Links all available from BigInsights wiki – Technical portal at http://tinyurl.com/biginsights

Page 7: Big Data and Hadoop:  Lab at Innovate 2014

Questions?