Hadoop Summit Japan 2011 Fall - LT by IBM

21
Data Discovery Tool BigSheets BigSheets MapReduce with No Coding? Atsushi Tsuchiya ([email protected]) Atsushi Tsuchiya ([email protected]) Big Data Tiger Team IBM Software IBM Software

description

Data Discovery Tool for BigInsights (on top of Hadoop) - MapReduce with no coding.

Transcript of Hadoop Summit Japan 2011 Fall - LT by IBM

Page 1: Hadoop Summit Japan 2011 Fall - LT by IBM

Data Discovery ToolBigSheetsBigSheets

MapReduce with No Coding?p g

Atsushi Tsuchiya ([email protected])Atsushi Tsuchiya ([email protected])Big Data Tiger Team

IBM SoftwareIBM Software

Page 2: Hadoop Summit Japan 2011 Fall - LT by IBM

Looking at DataLooking at Data

h ld d i h i d ?• What would you do with Big data? • How to make use of it?• It is difficult! – too vague.

• No specific problem that needs to be solved.p p• No specific question that needs to be answered.

• Only you know is to improve the business.y y p• But you have *data*• So what would you do first?• So, what would you do first?

Looking at Data!g

Page 3: Hadoop Summit Japan 2011 Fall - LT by IBM

IBM with HadoopIBM with Hadoop

• IBM has been working with Open source community for the long time.y g– Eclipse, Hadoop and so on …

• BigInsights include Hadoop

Page 4: Hadoop Summit Japan 2011 Fall - LT by IBM

BigInsightsBigInsights

i ih i d d f i d• BigInsihgts is IBM Hadoop product for Big data analytics.– Basic Edition (up to 10TB) – Free 無償で使えます!

– Enterprise Edition p

• Next version BigInsights ‐ coming soonNext version BigInsights coming soon.– v1.2 available.

• And many more

Page 5: Hadoop Summit Japan 2011 Fall - LT by IBM

BigInsights ComponetnsBigInsights Componetns

i ih i l d• BigInsihgts includes:– IBM Java– JAQL - IBMが開発した言語(オープンソース)

– IBM Distribution of Hadoop– BigSheets - データ探索ツール

– FLEX scheduler for Adaptive MapReduce – Orchestrator (Workflow Engine)– SystemT (Text Analytics), SystemML (Machine Learning)– LDAP– Web Console / Developer Studio

Page 6: Hadoop Summit Japan 2011 Fall - LT by IBM

BigInsights – Basic EditionBigInsights – Basic EditionFunction

VersionWill be Update

in NovBasic

EditiEnterprise

EditiFunction in Nov release.

Edition Edition

Integrated Install Inc IncOpen Source components:Hadoop (including common utilities, HDFS, MapReduce framework) 0.20.2 Inc IncJaql (programming / query language) 0.5.2 Inc IncPig (programming / query language) 0.7 Inc IncPig (programming / query language) 0.7 Inc IncFlume (data collection/aggregation) 0.9.1 Inc IncHive (data summarization/querying) 0.5 Inc IncLucene (text search) 3 0 2 Inc IncLucene (text search) 3.0.2 Inc IncZookeeper (process coordination) 3.2.2 Inc IncAvro (data serialization) 1.3.0 Inc Inc

( / ) 0 20 6HBase (real time read/write) 0.20.6 Inc IncOozie (workflow/ job orchestration) 2.2.2 Inc IncOnline documentation Inc IncCapability to integrate with DB2, InfoSphere Warehouse

Two DB2 UDFs to submit jobs, and read results from BigInsightsInc Inc

Page 7: Hadoop Summit Japan 2011 Fall - LT by IBM

BigInsights – Enterprise EditionBigInsights – Enterprise EditionFunction Basic

EditionEnterprise

EditionEdition EditionR Connector

Jaql module to invoke R statistical capabilities from BigInsights n/a IncN t C tNetezza Connector

Jaql modules to read/write data from/to Netezza n/a IncLDAP n/a IncWeb Console n/a IncWorkflow Engine n/a IncScheduler (Orchestrator) n/a IncScheduler (Orchestrator) n/a IncText Analytics Module (System T) n/a IncEclipse support (for System T)* n/a IncBigSheets – Data Discovery Tool n/a IncIBM Optim Development Studio V2.2.1.0 n/a IncSupport by IBM n/a Incpp y

Page 8: Hadoop Summit Japan 2011 Fall - LT by IBM

BigSheetsBigSheets

• A data exploring tool for Hadoop

• Only comes with BigInsights Enterprise editionOnly comes with BigInsights Enterprise edition

Page 9: Hadoop Summit Japan 2011 Fall - LT by IBM

BigSheets Concept ModelBigSheets Concept ModelEnrich Inspect

ExploreInternet No Coding is Required!

BigSheetsGather

Intranet

Publish Get/

Logs Gather

Manipulate

Massive Resultsin BigInsightsOther

Explore & Analyze

Page 10: Hadoop Summit Japan 2011 Fall - LT by IBM

It’s like a spreadsheetsIt s like a spreadsheets.

Looks very familiar ?!?

Page 11: Hadoop Summit Japan 2011 Fall - LT by IBM

VisualizationsVisualizations

• Predefined visualization

• Customer Plug‐inCustomer Plug in

A number of coffee shops in North America for each States.

Page 12: Hadoop Summit Japan 2011 Fall - LT by IBM

DEMODEMO

Page 13: Hadoop Summit Japan 2011 Fall - LT by IBM

GatherIntranet

BigSheetsInternet

LogsGather

i i h h d f

BigInsightsOther

• BigInsights can gather data from– Predefined formats :

• BigSheets data reader• Basic crawler data reader• Basic crawler data reader (binary support)• Basic crawler data reader (binary support)• Character‐delimited data reader• Tab Separated Value (TSV) data readerp ( )• JavaScript Object Notation (JSON) array reader• Comma Separated Value (CSV) data reader

– Customer BigSheets Reader 

Page 14: Hadoop Summit Japan 2011 Fall - LT by IBM

GatherIntranet

BigSheetsInternet

LogsGather

i i h i d d

BigInsightsOther

• BigInsights can import structured and unstructured data– CSV– Files– Network

• httpp• hdfs• AWS (S3n/S3)

– Other• Customer Importer

Page 15: Hadoop Summit Japan 2011 Fall - LT by IBM

CollectionIntranet

BigSheetsInternet

LogsCollection BigInsightsOther

A complete list of MacDonald's in North AmericaA complete list of MacDonald s in North America.

Page 16: Hadoop Summit Japan 2011 Fall - LT by IBM

Intranet

BigSheetsInternet

Logs

BigInsightsOther

Calculate

Reformat

Import

A complete list of MacDonald's in North America.

Page 17: Hadoop Summit Japan 2011 Fall - LT by IBM

Intranet

BigSheetsInternet

Logs

BigInsightsOther

Column chart

Heat map

Page 18: Hadoop Summit Japan 2011 Fall - LT by IBM

BigSheets in ActionBigSheets in Action

映 売 げ• Blockbuster 映画売り上げ予測– ABC Newsより

Page 19: Hadoop Summit Japan 2011 Fall - LT by IBM

Blockbuster –映画の売り上げ予測IBM BigInsights/BigSheets

①週末につぶやかれたTweets①週末につぶやかれたTweets (約200,000)フィードを受けて、

②数時間以内に、(今までは、月曜の朝になってから)‐売り上げ予測チャート作成売り上げ予測チャ ト作成‐センチメント分析例えば、今年の夏は、X‐manがどれよりも人気があった(つがどれよりも人気があ た(ぶやかれた)→宣伝、上映戦略などをこまめに修正

Page 20: Hadoop Summit Japan 2011 Fall - LT by IBM

ConclusionConclusion

• We all need to improve the business.

S h ld t t ith Bi d t ?• So, where would you start with Big data?

Data Discovery is a key to start improving YOUR Business!YOUR Business!

Page 21: Hadoop Summit Japan 2011 Fall - LT by IBM

Thank you!Thank you!