Big Data for Business - Working with the elephant made easy

Post on 30-Jun-2015

413 views 0 download


In many Big Data use cases, there is no way around the Open Source software Hadoop when it comes to processing the large data volumes extracted and collected from versatile data sources. However, using Hadoop also raises challenges in respects of many kinds. Writing Hadoop jobs for data collection and processing is a cumbersome task, which requires profound skills which are not available in most organizations. Further questions are how to visualize the outcomes, what to consider if you want to optimally configure a Hadoop solution, and how to shorten time to production. Fujitsu knows the answers, simplifies handling with Hadoop, and enables even business users without IT knowledge to get in touch with Big Data. Speakers: Mr. Dr Fritz Schinkel (Fujitsu)

Transcript of Big Data for Business - Working with the elephant made easy

0 Copyright 2014 FUJITSU

Human CentricInnovation

Human CentricInnovation

Fujitsu Forum2014

19th – 20th November

1 Copyright 2014 FUJITSU

Big Data for the Business –Working with the Elephant Made Easy

Dr. Fritz SchinkelProgram Manager for Cloud Infrastructures and Big Data Innovations, Fujitsu

2 Copyright 2014 FUJITSU

Data Driven Economy

3 Copyright 2014 FUJITSU

An emerging new world where

people, information, things and

infrastructure are connected via

networks, transforming work and

life everywhere

People Enormous numberof individuals

Information Big Data methods for new value

InfrastructureIndividual end points connected to central compute & storage

Fujitsu’s vision of a Hyperconnected World

4 Copyright 2014 FUJITSU

People: Improve Living and Empower Individuals


Is our energy system future proof ?


Should we invest in wind energy?

5 Copyright 2014 FUJITSU

Infrastructure: Transfer, Store and Process Data


Is our energy system future proof ?


Should we invest in wind energy?

Self driving car :

3.6 TB/h#

smart meters for 80% of EU electricity consumer by 2020


PBs of data from 100 weather satellites#

more than 50 billion connected things#

6 Copyright 2014 FUJITSU

Information: Create Insights from Collected Data



Traffic trends

! ! Wind measurement& weather trends

Weather risk assessment


Self driving car :

3.6 TB/h#

smart meters for 80% of EU electricity consumer by 2020


PBs of data from 100 weather satellites#

more than 50 billion connected things#


Is our energy system future proof ?


Should we invest in wind energy?

7 Copyright 2014 FUJITSU

Bringing together the 3 dimensions will realize business and social value

Expectations for Big Data Solutions

People & Business EmpowermentConnect people & empower for business ideas based on information

Creative IntelligenceCreate knowledge from information fast enough

Connected InfrastructureConnect everything, store and process collected data timely

8 Copyright 2014 FUJITSU

People and Business Empowerment

9 Copyright 2014 FUJITSU

Start Asking from the Business Side

What is your (new) business approach?

What are you expecting?

What can be earned? (business priority)?

What data do you have / need?

What is the expected total size?

What is your productive platform?Plat




How will you consolidate your data?

How do you analyze and discover meaning?

Which analytic methods will you apply?

How can you visualize results effectively?



Did you respect security, privacy, regulations?

Which skills do you have / do you need?

Is your concept flexible enough?



10 Copyright 2014 FUJITSU

Fujitsu Consulting and Services for Big Data

Big Data Assessment WorkshopUnderstand the opportunities Big Data can bring to your organization through the assessment of your organization’s strategic objectives, processes, and technical assets.

Strategy ConsultingDevelop the comprehensive Strategy Plan and optimal road map needed to efficiently introduce Big Data into your business.

Analytic ServiceFujitsu Big Data Analytics Services assist our customers quickly implement new Big Data analytics workflows through a proven Use Case driven approach

Hadoop ServicePragmatic, efficient and assured services for integrating Hadoop into your business.

Integration ServiceEstablish solution in your environment and connect to IT services.

• Fujitsu Big Data Assessment Workshop• Fujitsu Big Data Strategy Consulting• Fujitsu Big Data Analytics Services

Big Data Consulting Services

Fujitsu Services for Hadoop

Fujitsu Integration Services

11 Copyright 2014 FUJITSU

Analytic Services

Customer Intimacy Operational Efficiency

Risk Management Innovation


Improve efficiency of processes and reduce cost

Use your data to create new business models, products and services

Improve customer satisfaction and service Increase customer insight

Improve fraud detection, cyber security, and compliance

Adaptable use cases deliver short time-to-value

12 Copyright 2014 FUJITSU

Example: Weather Trend Analysis

Investment decision for wind park

Predict demography, traffic, wind power

ROI optimized by wind park location

Customer history, open weather data

100 TB of data is expected

Data will be processed on HadoopPlat




Import customer and weather data

Calculate local trends for wind power

Generate time series per location

Visualize data as map and trend chart



Check compliance for customer data

Basic analytic skills, meteorological skill

Use concept for solar power, insurance …



13 Copyright 2014 FUJITSU

Connected Infrastructure

14 Copyright 2014 FUJITSU

Data & Information Flow for Big Data

Sensors:Trace of the real world Feedback:

Actions in the real world

Idea:Creating newbusiness value

Outcome:Real business value

Data usage

Information Recommendation Marketing Product optimization Decision Control …

Data Sources

Corporate Data, History Public Data, e.g. weather Internet-Usage Social Networks Smartphone Usage Sensors e.g. in a car Quantified-Self …

Data store

Private data store Online / Nearline /

Archive Public data services Commercial data …

Modeling:Image of parts of the real world

Data analytics

Aggregating / Cleansing Modeling

Data processing

Statistics Correlation Classification Prediction Prescription …

15 Copyright 2014 FUJITSU

Big Data Infrastructure Reference Architecture:Choose Platform according to Business Problem

Consolidated data Distilled essence Applied knowledgeVarious data

Extract, Collect Cleanse, Transform Decide, ActAnalyze, Visualize

Data Sources Analytics Platform Access

Batch processing platform

Event processing platform

Fast response platform

Data bases

Application server






16 Copyright 2014 FUJITSU

Example: Weather Trend Analysis –Batch Preparation and Real-time Retrieval

Consolidated data Distilled essence Applied knowledgeVarious data

Extract, Collect Cleanse, Transform Decide, ActAnalyze, Visualize

Data Sources Analytics Platform Access

Batch processing platform

Event processing platform

Fast response platform

Data bases

Application server






Import weather history (50.000 GRIB files)

Invert time series of maps to map of time series (1.000.000 files)

Fast retrieval of time series and visualization

ERA interim data

17 Copyright 2014 FUJITSU

Platform: PRIMEFLEX for Hadoop

Software stack Hadoop core: Map Reduce / HDFS Streaming and In-memory technologies Analytic framework

Hadoop platform sourcing options On-premise: Entry or Rack option Off-premise: Cloud offering Storage – or Compute intensive workloads

Service and Consulting Integration Service Tool supported sizing Hadoop and Analytic Services

Entry Rack Cloud

Big Data Management


Analytic Services

Integration Service andSizing

18 Copyright 2014 FUJITSU

Iterative Big Data AnalyticsClassical Business Analytics

Manage Risk, Gain ValueIn


/ Re








e 1


st /




value1 value1












ate 1


Incremental investments and agile iterations leverage steep part of value curve.

19 Copyright 2014 FUJITSU

Creative Intelligence

20 Copyright 2014 FUJITSU

To Be Implemented: Big Data Value Chain

Big Data


Structured & unstructured data


Internet of Things



Research & development, science

Operation, automation,


Interactive reporting, advertising

Structured approach in three steps.

Social media, open data, linked data

21 Copyright 2014 FUJITSU

Implementation of Big Data Analytics

To be considered

Problem characteristic

Performance: Size / Runtime

Available skills

Implementation alternatives

Optimal Control

Complex Questions

Iterative Analysis

Find the right method for your business

22 Copyright 2014 FUJITSU

Optimal Control: Map Reduce Programming

Method: Program explicit map / reduce functions Characteristic

• Structured / unstructured data

• Parallel tasks on input data


• Fits to any size of cluster

• Best resource utilization


• Problem translation to Map / Reduce model

• Programming Java or script

Use case examples Relations, similarities, patterns in large data sets (e.g. clickstream)

Sort and split data along given criteria (e.g. transaction lists)

Invert table wrt. certain column (e.g. web index)

Process data on independent chunks (e.g. voice to text)

23 Copyright 2014 FUJITSU

Example: Time Series Transformation


Invert 20.000 weather maps with 1 million grid points to 1 million time series with 20.000 entries

Visualize location based results


Dedicated map reduce job on Hadoop

Visualization based on d3 graphics package


Development map / reduce: 4 days

Development web GUI: 5 days

Execution: 8 node cluster, 2h

HDFSMap reduce

transfer datato HDFS (flume)

transfer datato webserver (nfs)

Visualize data(Javascript)

Program and execute map reduce (Java)

24 Copyright 2014 FUJITSU

Complex Questions: SQL Hive, Impala

Method: Descriptive SQL queries Characteristic

• Structured data

• Medium to complex dependencies


• Highest volumes for batch-like execution

• Medium volume for dialog execution


• Problem description in SQL syntax (e.g. Hive or Impala)

• Business knowledge, mathematics, statistics

Use case examples Find column correlation (e.g. pricing strategy)

Compute statistics and derivate values (e.g. averages, median, variance)

Join data from several sources (e.g. transaction data with sentiment data)

Ad-hoc queries in trial phase (e.g. hypothesis verification)

25 Copyright 2014 FUJITSU

Example: Temperature Weekday Dependency I


Does local average temperature depend on weekday?

Approve or disprove hypothesis


Run ad-hoc query on Impala database

Do simple visualization in Excel


Development of SQL query: 0.5 Day

Visualization in Excel: 2 h

Execution: 8 node cluster, 30min


Import datato HDFS (impala)

DownloadData to PC

Visualize data(Excel)

Specify query(Impala SQL)

26 Copyright 2014 FUJITSU

Iterative Analytics: Big Data Spreadsheet

Method: Spreadsheet for Big Data Characteristic

• Structured / unstructured data

• Complex and unknown dependencies


• Highest volumes for batch-like execution

• In-Memory execution for smaller problems


• Select functions and compose formulas

• Business knowledge, mathematics, statistics

Use case examples Find hidden dependency patterns (e.g. credit fraud behavior)

Learn multi variant dependencies (e.g. decision trees)

Compute statistics and derivate values (e.g. averages, median, variance)

Join sources from multiple sources (e.g. weather data, traffic, sentiment)

27 Copyright 2014 FUJITSU

Example: Temperature Weekday Dependency II

Problem: Calculate local average temperature on weekdays

Visualize locations with strong variance (suspect for local warming)

Solution: Use Datameer calculation of averages per weekday

Visualize results using integrated Infographics

Visualize hot spot by web interface d3 graphics package

Realization: Development of Workbook: 2h

Visualization via Infographics: 2h

Development web GUI: 5 Days

Execution: 8 node cluster, 3h






Write & runWorkbook


28 Copyright 2014 FUJITSU

Fujitsu’s PRIMEFLEX for Hadoop at a Glance

Complexity made easy: Get in touch with Big Data, see what is possible.

Consult & implementConsulting and service program from strategy to implementation

Collect VisualizeUnderstand

Choice of analytics for highest control or highest comfort

Store & ComputeIntegrated and optimally sized on-premise or off-premise infrastructure

29 Copyright 2014 FUJITSU

30 Copyright 2014 FUJITSU


31 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (1)

Selectconnection and

file typefor import

32 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (2)


33 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (3)

Select and modify imported fields

34 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (4)

Define execution

plan …

… save and start

35 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (5)

Import is executed on the complete

cluster asynchronously as


36 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (6)

Create new workbook and add imported data

37 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (7)

Create new tab and start analytics

38 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (8)

Specify formulas and see results (on representative sample data) immediately

When all is complete, save workbook and press “run”

39 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (9)

Create new infographic …

When all is complete, save workbook and press “run”

… drag new widgets into your graphic …

… and bind it to data …

40 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (10)

Configure your widgets step by step

41 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (11)

Get complete page automatically published

Locations with most significant span between warmest and coldest weekday average

as map and as list

Number of grid points with maximum / minimum temperature on certain weekday

Locations with most significant span between warmest and coldest weekday average

and warmest day on a certain weekday

42 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (12)

Visualization GUI to study the span of weekday mean

temperature in certain places and to look for possible reasons

Map colored for high span of weekday mean temperature

43 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (13)

Sliders for span threshold,

contrast and opacity

of coloring.

44 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (14)

And an adjustment for grid points with low temperature

span over the complete observation time.

45 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (15)

Using the color settings and the zooming into the map

we can find areas with significant differences of

weekday mean values in the observed timeframe

46 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (16)

Click to a certain position shows the curve of average

temperature for the weekdays,

the coordinates and the total min/max temperature

of the point

47 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (17)

Map and satellite can be used to find possible

reasons for mean temperature related to


48 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (18)

Zoom into the source of the color cloud.

Industrial complex isshut down on Sunday?

49 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (19)

US east cost is cooler on Sunday / Monday.

Is traffic system heating

the atmosphere over the week?

50 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (20)

South of Hudson Bay is an area with Wednesday

mean temperature approx. 1C higher than on Saturday

Does wood industry influence the temperature

in the rhythm of the week?

51 Copyright 2014 FUJITSU