Human Face of Big Data - Archive...Big Data Analytics portfolio including SAP Real-time Data...
Transcript of Human Face of Big Data - Archive...Big Data Analytics portfolio including SAP Real-time Data...
1 © 2013 SAP AG or an SAP affiliate company. All rights reserved.
Welcome!
Big Data - Introduction to SAP Big Data Technologies
Big Data - Streaming Analytics
Big Data - Smarter Data Virtualization
Big Data - Gain New Insight from Hadoop
Big Data - Spatial Data Processing for Richer Insights
Big Data - Text Analytics
SAP Big Data Webinar Series
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 3
Speaker Introduction
Yuvaraj Athur Raghuvir, is a Senior Director in the SAP
HANA Platform Solution Management at SAP. He leads the
Big Data Analytics portfolio including SAP Real-time Data
Platform.
Yuvaraj has over 14 years of experience spanning Business
Applications, Business Analytic Solutions, Architecture and
Engineering.
Presented by: Yuvaraj Athur Raghuvir, SAP HANA Platform July 24 2013
Gain New insight from Hadoop
SAP Big Data Webinar Series
SHOPPERS GET FASHION ADVICE THAT FITS THEIR STYLE
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 6
Big Data Economics
Predictive incl. data mining and machine learning
“Unstructured” Analysis Incl. text, media, spatial etc.
Scalable Storage
Streaming Data incl. Sensors, Social and Mobile
Urgent Need
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 7
Open Source Community Gift: Apache Hadoop
Predictive incl. data mining and machine learning
“Unstructured” Analysis Incl. text, media, spatial etc.
Scalable Storage
Streaming Data incl. Sensors, Social and Mobile
Urgent Need
Apache Hadoop
► [Commons]
► [Projects]
► [Distributions]
► [Data Scientists]
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 8
Hadoop Possibilities
Source: 1) Gartner Blog http://blogs.gartner.com/merv-adrian/2013/02/23/hadoop-2013-part-three-platforms/ retrieved on July 17 2013 2) Gartner Blog http://blogs.gartner.com/merv-adrian/2013/03/08/hadoop-2013-part-four-players/ retrieved on July 17 2013 3) GitHub query “Hadoop” retrived on July 17 2013
Amazon has reported that it started 2 million Elastic MapReduce (EMR) clusters – in a single year.
Pushing New Boundaries! Growing Popularity
among Vendors ► Storage ► Compute ► Network ► High Performance
Computing Vendors
Dynamic Community with over 2500+ Hadoop related projects in GitHub.
9 © 2013 SAP AG or an SAP affiliate company. All rights reserved.
Hadoop Community
Source: GitHub query “Hadoop” retrived on July 17 2013
Complexity!
2500+ Community Projects!!
Source: GigaOm Infographic, Mar 5, retrieved on July 17 2013
Highly Dynamic & Evolving Ecosystem
Projects & Alternatives
Source: Gartner Blog http://blogs.gartner.com/merv-adrian/2013/02/21/hadoop-2013-part-two-projects/ retreived on July 17 2013
Gartner Infographic
GigaOm Infographic
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 10
The Big Data Phenomenon Big Data is more than just Hadoop…
Exploding data volumes
Accelerating data velocity
Increasing data variety
Business Trends Technology Trends
Storage / Memory / CPU advances
Data Mining/Predictive analysis
In-memory computing
Hadoop & distributed MPP
Complex event processing
“Enterprise” Big
Data
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 11
Big Data Challenges Business and technical needs
Business Needs Quick insights from all business-relevant data Pick the right action among many choices
Action 2
Action 1
Action 3
Quick Insight Right Action All Relevant Big Data
100101 011010 100101
Technical Challenges
Cost of store vs. cost to process data considerations
Pressure to gain insight quickly from data
Diversity of data formats makes it difficult to analyze
Need to have disparate technologies interoperate across enterprise
Determine the right action among many valuable insights across variety, volume, velocity and technology is difficult
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 12
How to Capitalize on the Big Data Opportunity and Address Big Data Technical Challenges?
To deploy an integrated data processing framework
To enable real-time, actionable insights in business process
context
To derive new value from INFORMATION
Optimize data management in each phase of the information lifecycle process
Regardless of data source, processing technologies, latency challenges, number of user demands
Marry business process insights from structured data analysis with deep pattern, behavior analysis of unstructured data
Enable decision making based on multi-factor considerations, not just instinct/experience
Focus on deriving new value from data by enabling new business and technology use cases previously not feasible
Augment existing business scenarios with new data insights to enable better decision
13 © 2013 SAP AG or an SAP affiliate company. All rights reserved.
Hadoop
Data storage (Hadoop Distributed File system)
Job Management
Computation Engine(s)
BI and analytics software from SAP
In-memory
Disk-based data ware-house (SAP Sybase IQ)
… and/or ...
Analytic engine
Analytic engine
Non-SAP solutions
SAP® Solutions
Data warehouse/database (SAP HANA, SAP Sybase IQ/SAP Sybase
ASE)
SAP Data Services
SAP Business Suite
Other SAP solutions
Hadoop as a flexible data store
Hadoop as a simple database
Hadoop as a processing engine
Hadoop for data analytics Reference Data
Streaming Data
Enterprise Data
Transaction Data
Social Media
Enterprise Scenarios with Hadoop
14 © 2013 SAP AG or an SAP affiliate company. All rights reserved.
Hadoop
Data storage (Hadoop Distributed File system)
Job Management
Computation Engine(s)
Hadoop as a flexible data store
Reference Data
Streaming Data
Enterprise Data
Transaction Data
Social Media
Hadoop as a Simple Database
SAP® Solutions
Data warehouse/database (SAP HANA, SAP Sybase IQ/SAP Sybase
ASE)
SAP Data Services
SAP Business Suite
Other SAP solutions
Scenarios of Use include: ► Extract-Transform-Load from other systems
to Hadoop. ► SAP Data Services provides ETL
support from Hadoop to SAP HANA. ► Store & Retrieve Structured data based on
projects like Hive. ► Depending on the scenario, scalable
analytical systems like Sybase IQ can also be considered.
► Handle large documents as “Blobs” to do retrievals or analytics later.
► Use as a near-line store for offloading data that is considered “cold” or “frozen”
► Data lifecycle management is typically manual.
Focus: Storage and Retrieval of data from Hadoop typically using interfaces provided by HIVE or direct HDFS access
Hadoop as a simple database
15 © 2013 SAP AG or an SAP affiliate company. All rights reserved.
Hadoop
Data storage (Hadoop Distributed File system)
Job Management
Computation Engine(s)
Hadoop as a flexible data store
Reference Data
Streaming Data
Enterprise Data
Transaction Data
Social Media
Hadoop as a Processing Engine
SAP® Solutions
Data warehouse/database (SAP HANA, SAP Sybase IQ/SAP Sybase
ASE)
SAP Data Services
SAP Business Suite
Other SAP solutions
Scenarios of Use include: ► Data Enrichment.
► Push down of Text Data Transforms from Data Services is an example
► Data Pattern Analysis. ► This is an emerging space across
new data forms. ► Convergence between procedural
data science and declarative access patterns are evolving.
Focus: Distributed Compute leveraging the Map-Reduce computation framework of Hadoop on large distributed data sets
Hadoop as a processing engine
16 © 2013 SAP AG or an SAP affiliate company. All rights reserved.
Hadoop
Data storage (Hadoop Distributed File system)
Job Management
Computation Engine(s)
Hadoop as a flexible data store
Reference Data
Streaming Data
Enterprise Data
Transaction Data
Social Media
Hadoop for Data Analytics
Scenarios of Use include: ► Cross DM analytics.
► Practical only when performance from Hadoop is acceptable
► Stand-alone analytics. ► Emerging area to use Hadoop as the
direct data store for analytics
Focus: A combination of storage and delegated analytics supporting two approaches: • Two-Phase Analytics: Background
processing engine refine and feeding data • Federated Queries: Client side federation
across data stores
BI and analytics software from SAP
In-memory Disk-based data ware-house (SAP Sybase IQ)
… and/or ...
Analytic engine Analytic
engine
Hadoop for data analytics
© 2013 SAP AG. All rights reserved. 17
SAP HANA SP6 - smart data access capability Data virtualization for on-premise and hybrid cloud environments
Benefits Enables access to remote data access
just like “local” table Provides SAP HANA to SAP HANA queries Smart query processing including query decomposition
with predicate push-down, functional compensation Supports data location agnostic development No special syntax to access heterogeneous data sources Non-disruptive evolution
Heterogeneous data sources SAP HANA to Hadoop (Hive) Teradata SAP Sybase ASE SAP Sybase IQ
Transactions + Analytics
Teradata
Hadoop
SAP HANA
ASE
IQ
SAP HANA
Virtual Tables HANA Tables
New
© 2013 SAP AG. All rights reserved. 18
Example Reference Architecture : Machine-to-Machine Infrastructure Run-Time Architecture
Stream
Batch
Synchronization
Cloud / Backend
M2M Application Server Application and
Analytical Services
M2M Services
Core Services Industry-Specific Services
End User Apps
Mobile App Web App Dashboard
Edge
Data Acquisition & Processing
Device Management
SQLA / UltraLite
Data Persistence
Device
Real-Time Data Platform
ESP Event Processing
SQLA Data Synchronization
HANA
“Hot Data”
ASE / IQ “Warm and Cold Data”
Predictive Models
Data Models
Hadoop
Big Data Sets
Cellular M2M revenue opportunity projected to reach $1.2 Trillion by 2020
Meter Data Management will exceed $420 Million by 2020 with a CAGR of 16.8%[2]
Annual smart meter shipments to surpass 140 million units worldwide by 2016, representing a CAGR of 32.9%.[3]
Picture: Beecham Research 1. GSMA; 2.Pike Research; 3. IDC Energy Insights
Healthcare
Public Sector
Industrial
MRI, PDAs
Implants, Surgical Equipment
Pumps, Monitors
Telemedicine, etc.
Tolls, etc.
Planes, Signage
Vehicles, Lights, Ships Tanks, Fighter Jets
Battlefield Comms
Jeeps, Cars, Ambulances
Breakdown, Lone Worker Homeland Security Environ. Monitoring,
etc.
Pumps, Valves, Vats, Conveyors, Pipelines
Motors, Drives, Converting, Fabrication
Assembly/Packaging, Vessels/Tanks, etc.
Turbines
Windmills
UPS
Batteries
Generators
Meters, Drills
Fuel Cells, etc.
Beyond Business Networks – Internet of Things
DRIVERS DIVERTED BEFORE FATAL ACCIDENTS HAPPEN
© 2013 SAP AG or an SAP affiliate company. All rights reserved.
Thank You!
SAP Big Data Webinar Series
Presented by: Yuvaraj Athur Raghuvir, SAP HANA Platform
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 22
Hadoop Commons : Core / Common Components
Hadoop Distributed File System: HDFS, the storage layer of Hadoop, is a distributed, scalable, Java-based file system adept at storing large volumes of unstructured data.
MapReduce: MapReduce is a software framework that serves as the compute layer of Hadoop. MapReduce jobs are divided into two (obviously named) parts. The “Map” function divides a query into multiple parts and processes data at the node level. The “Reduce” function aggregates the results of the “Map” function to determine the “answer” to the query.
Hive: Hive is a Hadoop-based data warehousing-like framework originally developed by Facebook. It allows users to write queries in a SQL-like language caled HiveQL, which are then converted to MapReduce. This allows SQL programmers with no MapReduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools such as Microstrategy, Tableau, Revolutions Analytics, etc.
Pig: Pig Latin is a Hadoop-based language developed by Yahoo. It is relatively easy to learn and is adept at very deep, very long data pipelines (a limitation of SQL.)
HBase: HBase is a non-relational database that allows for low-latency, quick lookups in Hadoop. It adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes. EBay and Facebook use HBase heavily.
Source: http://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_Hadoop_Defined retrieved on Jul 17 2013
Flume: Flume is a framework for populating Hadoop with data. Agents are populated throughout ones IT infrastructure – inside web servers, application servers and mobile devices, for example – to collect data and integrate it into Hadoop.
Oozie: Oozie is a workflow processing system that lets users define a series of jobs written in multiple languages – such as Map Reduce, Pig and Hive -- then intelligently link them to one another. Oozie allows users to specify, for example, that a particular query is only to be initiated after specified previous jobs on which it relies for data are completed.
© 2013 SAP AG or an SAP affiliate company. All rights reserved. 23
Hadoop Commons : Core / Common Components
Ambari: Ambari is a web-based set of tools for deploying, administering and monitoring Apache Hadoop clusters. It's development is being led by engineers from Hortonwroks, which include Ambari in its Hortonworks Data Platform.
Avro: Avro is a data serialization system that allows for encoding the schema of Hadoop files. It is adept at parsing data and performing removed procedure calls.
Mahout: Mahout is a data mining library. It takes the most popular data mining algorithms for performing clustering, regression testing and statistical modeling and implements them using the Map Reduce model.
Sqoop: Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such as relational databases and data warehouses – into Hadoop. It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata or other relational databases to the target.
HCatalog: HCatalog is a centralized metadata management and sharing service for Apache Hadoop. It allows for a unified view of all data in Hadoop clusters and allows diverse tools, including Pig and Hive, to process any data elements without needing to know physically where in the cluster the data is stored.
BigTop: BigTop is an effort to create a more formal process or framework for packaging and interoperability testing of Hadoop's sub-projects and related components with the goal improving the Hadoop platform as a whole.
Source: http://wikibon.org/wiki/v/HBase,_Sqoop,_Flume_and_More:_Apache_Hadoop_Defined retrieved on Jul 17 2013