Big Data for Business - Working with the elephant made easy
-
Upload
fujitsu-global -
Category
Technology
-
view
413 -
download
0
description
Transcript of Big Data for Business - Working with the elephant made easy
0 Copyright 2014 FUJITSU
Human CentricInnovation
Human CentricInnovation
Fujitsu Forum2014
19th – 20th November
1 Copyright 2014 FUJITSU
Big Data for the Business –Working with the Elephant Made Easy
Dr. Fritz SchinkelProgram Manager for Cloud Infrastructures and Big Data Innovations, Fujitsu
2 Copyright 2014 FUJITSU
Data Driven Economy
3 Copyright 2014 FUJITSU
An emerging new world where
people, information, things and
infrastructure are connected via
networks, transforming work and
life everywhere
People Enormous numberof individuals
Information Big Data methods for new value
InfrastructureIndividual end points connected to central compute & storage
Fujitsu’s vision of a Hyperconnected World
4 Copyright 2014 FUJITSU
People: Improve Living and Empower Individuals
?
Is our energy system future proof ?
?
Should we invest in wind energy?
5 Copyright 2014 FUJITSU
Infrastructure: Transfer, Store and Process Data
?
Is our energy system future proof ?
?
Should we invest in wind energy?
Self driving car :
3.6 TB/h#
smart meters for 80% of EU electricity consumer by 2020
#
PBs of data from 100 weather satellites#
more than 50 billion connected things#
6 Copyright 2014 FUJITSU
Information: Create Insights from Collected Data
!Demography
prediction!
Traffic trends
! ! Wind measurement& weather trends
Weather risk assessment
!
Self driving car :
3.6 TB/h#
smart meters for 80% of EU electricity consumer by 2020
#
PBs of data from 100 weather satellites#
more than 50 billion connected things#
?
Is our energy system future proof ?
?
Should we invest in wind energy?
7 Copyright 2014 FUJITSU
Bringing together the 3 dimensions will realize business and social value
Expectations for Big Data Solutions
People & Business EmpowermentConnect people & empower for business ideas based on information
Creative IntelligenceCreate knowledge from information fast enough
Connected InfrastructureConnect everything, store and process collected data timely
8 Copyright 2014 FUJITSU
People and Business Empowerment
9 Copyright 2014 FUJITSU
Start Asking from the Business Side
What is your (new) business approach?
What are you expecting?
What can be earned? (business priority)?
What data do you have / need?
What is the expected total size?
What is your productive platform?Plat
form
Valu
e
How will you consolidate your data?
How do you analyze and discover meaning?
Which analytic methods will you apply?
How can you visualize results effectively?
Tool
s
Did you respect security, privacy, regulations?
Which skills do you have / do you need?
Is your concept flexible enough?
Mis
c.
10 Copyright 2014 FUJITSU
Fujitsu Consulting and Services for Big Data
Big Data Assessment WorkshopUnderstand the opportunities Big Data can bring to your organization through the assessment of your organization’s strategic objectives, processes, and technical assets.
Strategy ConsultingDevelop the comprehensive Strategy Plan and optimal road map needed to efficiently introduce Big Data into your business.
Analytic ServiceFujitsu Big Data Analytics Services assist our customers quickly implement new Big Data analytics workflows through a proven Use Case driven approach
Hadoop ServicePragmatic, efficient and assured services for integrating Hadoop into your business.
Integration ServiceEstablish solution in your environment and connect to IT services.
• Fujitsu Big Data Assessment Workshop• Fujitsu Big Data Strategy Consulting• Fujitsu Big Data Analytics Services
Big Data Consulting Services
Fujitsu Services for Hadoop
Fujitsu Integration Services
11 Copyright 2014 FUJITSU
Analytic Services
Customer Intimacy Operational Efficiency
Risk Management Innovation
Categories
Improve efficiency of processes and reduce cost
Use your data to create new business models, products and services
Improve customer satisfaction and service Increase customer insight
Improve fraud detection, cyber security, and compliance
Adaptable use cases deliver short time-to-value
12 Copyright 2014 FUJITSU
Example: Weather Trend Analysis
Investment decision for wind park
Predict demography, traffic, wind power
ROI optimized by wind park location
Customer history, open weather data
100 TB of data is expected
Data will be processed on HadoopPlat
form
Valu
e
Import customer and weather data
Calculate local trends for wind power
Generate time series per location
Visualize data as map and trend chart
Tool
s
Check compliance for customer data
Basic analytic skills, meteorological skill
Use concept for solar power, insurance …
Mis
c.
13 Copyright 2014 FUJITSU
Connected Infrastructure
14 Copyright 2014 FUJITSU
Data & Information Flow for Big Data
Sensors:Trace of the real world Feedback:
Actions in the real world
Idea:Creating newbusiness value
Outcome:Real business value
Data usage
Information Recommendation Marketing Product optimization Decision Control …
Data Sources
Corporate Data, History Public Data, e.g. weather Internet-Usage Social Networks Smartphone Usage Sensors e.g. in a car Quantified-Self …
Data store
Private data store Online / Nearline /
Archive Public data services Commercial data …
Modeling:Image of parts of the real world
Data analytics
Aggregating / Cleansing Modeling
Data processing
Statistics Correlation Classification Prediction Prescription …
15 Copyright 2014 FUJITSU
Big Data Infrastructure Reference Architecture:Choose Platform according to Business Problem
Consolidated data Distilled essence Applied knowledgeVarious data
Extract, Collect Cleanse, Transform Decide, ActAnalyze, Visualize
Data Sources Analytics Platform Access
Batch processing platform
Event processing platform
Fast response platform
Data bases
Application server
Webcontent
Sensordata
AppsServicesQueries
VisualizationReporting
Notification
16 Copyright 2014 FUJITSU
Example: Weather Trend Analysis –Batch Preparation and Real-time Retrieval
Consolidated data Distilled essence Applied knowledgeVarious data
Extract, Collect Cleanse, Transform Decide, ActAnalyze, Visualize
Data Sources Analytics Platform Access
Batch processing platform
Event processing platform
Fast response platform
Data bases
Application server
Webcontent
Sensordata
AppsServicesQueries
VisualizationReporting
Notification
Import weather history (50.000 GRIB files)
Invert time series of maps to map of time series (1.000.000 files)
Fast retrieval of time series and visualization
ERA interim data
17 Copyright 2014 FUJITSU
Platform: PRIMEFLEX for Hadoop
Software stack Hadoop core: Map Reduce / HDFS Streaming and In-memory technologies Analytic framework
Hadoop platform sourcing options On-premise: Entry or Rack option Off-premise: Cloud offering Storage – or Compute intensive workloads
Service and Consulting Integration Service Tool supported sizing Hadoop and Analytic Services
Entry Rack Cloud
Big Data Management
Analytics
Analytic Services
Integration Service andSizing
18 Copyright 2014 FUJITSU
Iterative Big Data AnalyticsClassical Business Analytics
Manage Risk, Gain ValueIn
vest
/ Re
turn
time
ETL1
analysis1
operate1
har
dwar
e 1
Inve
st /
Retu
rn
time
value1 value1
value2
value3
value4
value5
HW
1
ETL&
anal
ysis
1
oper
ate 1
value2
Incremental investments and agile iterations leverage steep part of value curve.
19 Copyright 2014 FUJITSU
Creative Intelligence
20 Copyright 2014 FUJITSU
To Be Implemented: Big Data Value Chain
Big Data
ExtractCollect
Structured & unstructured data
Devices,sensors,
Internet of Things
CleanseTransformAnalyze
FindDecideAct
Research & development, science
Operation, automation,
production
Interactive reporting, advertising
Structured approach in three steps.
Social media, open data, linked data
21 Copyright 2014 FUJITSU
Implementation of Big Data Analytics
To be considered
Problem characteristic
Performance: Size / Runtime
Available skills
Implementation alternatives
Optimal Control
Complex Questions
Iterative Analysis
Find the right method for your business
22 Copyright 2014 FUJITSU
Optimal Control: Map Reduce Programming
Method: Program explicit map / reduce functions Characteristic
• Structured / unstructured data
• Parallel tasks on input data
Performance
• Fits to any size of cluster
• Best resource utilization
Skills
• Problem translation to Map / Reduce model
• Programming Java or script
Use case examples Relations, similarities, patterns in large data sets (e.g. clickstream)
Sort and split data along given criteria (e.g. transaction lists)
Invert table wrt. certain column (e.g. web index)
Process data on independent chunks (e.g. voice to text)
23 Copyright 2014 FUJITSU
Example: Time Series Transformation
Problem:
Invert 20.000 weather maps with 1 million grid points to 1 million time series with 20.000 entries
Visualize location based results
Solution:
Dedicated map reduce job on Hadoop
Visualization based on d3 graphics package
Realization:
Development map / reduce: 4 days
Development web GUI: 5 days
Execution: 8 node cluster, 2h
HDFSMap reduce
transfer datato HDFS (flume)
transfer datato webserver (nfs)
Visualize data(Javascript)
Program and execute map reduce (Java)
24 Copyright 2014 FUJITSU
Complex Questions: SQL Hive, Impala
Method: Descriptive SQL queries Characteristic
• Structured data
• Medium to complex dependencies
Performance
• Highest volumes for batch-like execution
• Medium volume for dialog execution
Skills
• Problem description in SQL syntax (e.g. Hive or Impala)
• Business knowledge, mathematics, statistics
Use case examples Find column correlation (e.g. pricing strategy)
Compute statistics and derivate values (e.g. averages, median, variance)
Join data from several sources (e.g. transaction data with sentiment data)
Ad-hoc queries in trial phase (e.g. hypothesis verification)
25 Copyright 2014 FUJITSU
Example: Temperature Weekday Dependency I
Problem:
Does local average temperature depend on weekday?
Approve or disprove hypothesis
Solution:
Run ad-hoc query on Impala database
Do simple visualization in Excel
Realization:
Development of SQL query: 0.5 Day
Visualization in Excel: 2 h
Execution: 8 node cluster, 30min
HDFSimpala
Import datato HDFS (impala)
DownloadData to PC
Visualize data(Excel)
Specify query(Impala SQL)
26 Copyright 2014 FUJITSU
Iterative Analytics: Big Data Spreadsheet
Method: Spreadsheet for Big Data Characteristic
• Structured / unstructured data
• Complex and unknown dependencies
Performance
• Highest volumes for batch-like execution
• In-Memory execution for smaller problems
Skills
• Select functions and compose formulas
• Business knowledge, mathematics, statistics
Use case examples Find hidden dependency patterns (e.g. credit fraud behavior)
Learn multi variant dependencies (e.g. decision trees)
Compute statistics and derivate values (e.g. averages, median, variance)
Join sources from multiple sources (e.g. weather data, traffic, sentiment)
27 Copyright 2014 FUJITSU
Example: Temperature Weekday Dependency II
Problem: Calculate local average temperature on weekdays
Visualize locations with strong variance (suspect for local warming)
Solution: Use Datameer calculation of averages per weekday
Visualize results using integrated Infographics
Visualize hot spot by web interface d3 graphics package
Realization: Development of Workbook: 2h
Visualization via Infographics: 2h
Development web GUI: 5 Days
Execution: 8 node cluster, 3h
Hadoop
Import
Dat
amee
r
Write & runWorkbook
Infographic
28 Copyright 2014 FUJITSU
Fujitsu’s PRIMEFLEX for Hadoop at a Glance
Complexity made easy: Get in touch with Big Data, see what is possible.
Consult & implementConsulting and service program from strategy to implementation
Collect VisualizeUnderstand
Choice of analytics for highest control or highest comfort
Store & ComputeIntegrated and optimally sized on-premise or off-premise infrastructure
29 Copyright 2014 FUJITSU
30 Copyright 2014 FUJITSU
Showcase
31 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (1)
Selectconnection and
file typefor import
32 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (2)
Configureimport
33 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (3)
Select and modify imported fields
34 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (4)
Define execution
plan …
… save and start
35 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (5)
Import is executed on the complete
cluster asynchronously as
planned
36 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (6)
Create new workbook and add imported data
37 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (7)
Create new tab and start analytics
38 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (8)
Specify formulas and see results (on representative sample data) immediately
When all is complete, save workbook and press “run”
39 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (9)
Create new infographic …
When all is complete, save workbook and press “run”
… drag new widgets into your graphic …
… and bind it to data …
40 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (10)
Configure your widgets step by step
41 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (11)
Get complete page automatically published
Locations with most significant span between warmest and coldest weekday average
as map and as list
Number of grid points with maximum / minimum temperature on certain weekday
Locations with most significant span between warmest and coldest weekday average
and warmest day on a certain weekday
42 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (12)
Visualization GUI to study the span of weekday mean
temperature in certain places and to look for possible reasons
Map colored for high span of weekday mean temperature
43 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (13)
Sliders for span threshold,
contrast and opacity
of coloring.
44 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (14)
And an adjustment for grid points with low temperature
span over the complete observation time.
45 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (15)
Using the color settings and the zooming into the map
we can find areas with significant differences of
weekday mean values in the observed timeframe
46 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (16)
Click to a certain position shows the curve of average
temperature for the weekdays,
the coordinates and the total min/max temperature
of the point
47 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (17)
Map and satellite can be used to find possible
reasons for mean temperature related to
weekdays.
48 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (18)
Zoom into the source of the color cloud.
Industrial complex isshut down on Sunday?
49 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (19)
US east cost is cooler on Sunday / Monday.
Is traffic system heating
the atmosphere over the week?
50 Copyright 2014 FUJITSU
Fujitsu Showcase: Weekdays and Weather (20)
South of Hudson Bay is an area with Wednesday
mean temperature approx. 1C higher than on Saturday
Does wood industry influence the temperature
in the rhythm of the week?
51 Copyright 2014 FUJITSU