The New Possible: Very Big Data for Serious Business Value

43

Transcript of The New Possible: Very Big Data for Serious Business Value

Twitter Tag: #briefr

[email protected]

Twitter Tag: #briefr

!  Reveal the essential characteristics of enterprise software, good and bad

!  Provide a forum for detailed analysis of today’s innovative technologies

!  Give vendors a chance to explain their product to savvy analysts

!  Allow audience members to pose serious questions... and get answers!

Twitter Tag: #briefr

!  November: Cloud

!  December: Innovators

!  January: Big Data

!  February: Performance

!  March: Integration

Twitter Tag: #briefr

!  Databases were designed primarily to store information for retrieval at a later time.

!  Big Data requires big databases. !  The convergence of multi-structured data and the need to

perform both transactional and operational analytics has led to substantial innovations in database technologies.

!  Today some of the biggest databases blend the best of both worlds, transforming the way organizations store and analyze enterprise data.

Twitter Tag: #briefr

Robin Bloor is Chief Analyst at

The Bloor Group.

[email protected]

Twitter Tag: #briefr

!   German-founded SAP is one of the largest software companies in the world. Its best-known products are SAP ERP, SAP Business Warehouse, SAP Business Objects, SAP Sybase IQ and SAP HANA.

!   SAP offers a comprehensive set of database management solutions that spans the needs of the enterprise, leveraging in-memory, cloud and mobile technologies.

!   Recent innovations include a Big Data analytics platform that loads, processes and delivers massive amounts of multi-structured data and is accessible on demand enterprise-wide.

Twitter Tag: #briefr

Courtney Claussen is a product manager at Sybase, Inc., concentrating on Sybase's data warehousing and analytics products.

She has enjoyed a 30 year career in software development, technical support

and product marketing in the areas of computer aided design, computer aided

software engineering, database management systems, middleware, and

analytics.

CONFIDENTIAL

The New Possible: Very Big Data for Serious Business Value The Briefing Room with Dr. Robin Bloor and SAP

October 9, 2012

©  2012 SAP AG. All rights reserved. 10

AGENDA

•  Big Data Analytics: A Reality •  SAP Sybase IQ: Built for Big Data Analytics •  SAP Sybase IQ: Continuing Innovation

A Reality

Big Data Analytics

©  2012 SAP AG. All rights reserved. 12

Operational Efficiencies

Revenue Growth

New Strategies & Business Models

*A McKinsey study titled “Big Data: Next frontier for innovation, competition, and productivity”, May 2011, has found huge potential for Big Data Analytics with metrics as impressive as 60% improvements in Retail operating margins, 8% reduction in (US) national healthcare expenditures, and $150M savings in operational efficiencies in European economies

Business Value*

THE NEW DYNAMICS OF BUSINESS COMPETING ON BIG DATA DRIVEN ANALYTICS

©  2012 SAP AG. All rights reserved. 13

Getting Value from Big Data

Find supply chain inefficiencies

Uncover insurance fraud

Dispense correct health care

Predict financial performance

Optimize stocking of products

Maintain customer loyalty

Applied Big Data Analytics

©  2012 SAP AG. All rights reserved. 14

Business Vale*

EDW AND BIG DATA PLATFORMS CONTRASTS

Big Data

EDW

Large -> ENORMOUS

SQL -> Programmatic Enterprise data,

relational, structured,

indexed text

Clickstreams, sensors, log data,

unstructured social media

Scale up -> Scale out

Schema -> No schema

OLAP -> Batch processing

Pre-processed data -> Raw data

©  2012 SAP AG. All rights reserved. 15

Business Vale*

Big Data

EDW

Enterprise data, relational, structured,

indexed text

Clickstreams, sensors, log data, unstructured

social media

•  Combine all relevant data for better insights

•  Real-time BI •  SQL declarative processing •  Big Data pre-processing

with EDW deep analytics

EDW AND BIG DATA PLATFORMS PARTNERSHIP

©  2012 SAP AG. All rights reserved. 16

Big-data analytics plus data warehousing Deserves a new platform

Support  massive  numbers    of  users  and  workloads  

Analyze  massive  volumes  of  complex  data  from  many  sources  

�  Platform accessible to all business processes and all business users

�  Requirement for data and algorithms together in the platform

�  Ability to distribute interactions throughout the enterprise

Specialized apps

Web Mobile

Integrated workflow

Operational reporting

OLAP*

Data mining

Data loading

In-DB analytics

HDFS

�  MapReduce �  RDBMS+

�  EDW

�  Volume �  Velocity �  Variety �  Costs �  Skills

*Online analytical processing +Relational database management

system

BUILT FOR BIG DATA ANALYTICS

SAP SYBASE IQ

©  2012 SAP AG. All rights reserved. 18

Multi-dimensional scale out •  Multiple resources can scale out independently

–  Storage, server (CPU, memory), SAN switches, interconnect can scale on their own •  Scale out is incremental and linear

–  No need to add large units of monolithic CPU/storage pairs

Grid architecture System scale out

Storage Fabric

Full Mesh Interconnect

©  2012 SAP AG. All rights reserved. 19

Deployed use case comScore Networks measures the digital world

�  comScore provides solutions for online audience measurement, e-commerce, advertising, search, video and mobile to analysts with digital marketing and vertical-specific industry expertise

�  Large SAP Sybase IQ Multiplex Grid on v15.x with 10s of servers and hundreds of CPU cores

�  Manages more than 150TB of data with trillions of rows and 10s of thousands of tables �  More than 200+ concurrent users with highly parallel and distributed workload �  Incrementally scalable on commodity hardware

Storage Fabric

……………

©  2012 SAP AG. All rights reserved. 20

Virtual data marts •  VDM is logical binding of mutually exclusive nodes, memory, storage

–  Logical Server (LS) is a mutually exclusive logical binding of nodes, memory –  Logical Server (LS) is a subset of VDM –  Bindings are elastic i.e. they can dynamically grow/shrink

Community platform Elastic virtual data marts

VDM1 VDM2 Shared

Virtual Shared Storage

Virtual Shared CPU, Memory

Full Mesh High Speed Interconnect

Storage Fabric

Logical Server 1 Logical Server 2

©  2012 SAP AG. All rights reserved. 21

Robust load engine

Loading can be from multiple modes:

�  Parallel bulk load processing:

–  Load rates in excess of 250 GB/hr are common even with modest-size hardware nodes

�  Continuous and trickle feed via microbatching (change data capture)

Page-level snapshot versioning:

�  Allows non-blocking concurrent loads and queries

Load from client machines

Full-­‐mesh  interconnect  

Extraction, transformation, and load (ETL) in SAP software

Scale out Scale out ETL project 1 ETL project 1 ETL project 1

Storage fabric

©  2012 SAP AG. All rights reserved. 22

Massively parallel processing •  Leader node: Receives and initiates queries

–  Any node can be a leader –  Leader node may satisfy query within itself

•  Worker node: Nodes pick up work units from leader –  Many worker nodes per query –  Same worker node can serve multiple queries

Query engine Distributed query processing

Query 2 4 node DQP

Query 1 5 node DQP

Storage Fabric

©  2012 SAP AG. All rights reserved. 23

Text search and analysis

Text index

ID Term Pos Info

0 a 1,3,4

1 b 1,5

2 c 1

3 d 2,3,4

4 e 2,4,5

5 f 2,5

TextCol

a b c

f e e d

d a d

d e a d

b e e f

Table in SAP Sybase IQ

Full-text queries: SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘d’); – returns rows SELECT * FROM myTable CONTAINS (TextCol, ‘d’); – returns rows and scoring SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘a AND NOT b’); – Boolean SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘a NEAR b’); – proximity

Full text queries

Visualization

Text load

File ingestion into blob or clob

Entity extraction

Categorization tokenization

Schema transform

Hierarchical to relational

Text filtering

Filtering to plain text and formatting ?

©  2012 SAP AG. All rights reserved. 24

Analytics simplified: Logic to data = fast and efficient

In-database analytics

No compromise for complex analytics: �  Basic to advanced analytical functions available to SQL �  Data never leaves the database until results are materialized �  Analytics code and models are shareable �  Analytics code and models are applicable to the latest data set �  Average developer can build in database analytical models

Process in SAP Sybase IQ

Built-in functions External DLL “A”

External DLL “A”

Database = logic and filtering

applied in database

©  2012 SAP AG. All rights reserved. 25

Client-side federation: Join data from SAP Sybase IQ and Hadoop at a client-application level

Load Hadoop data into column store of SAP Sybase IQ: Extract, transform, and load data from Hadoop distributed file system (HDFS) into schemas of SAP Sybase IQ

ETL

Join HDFS data with data of SAP Sybase IQ on the fly: Fetch and join subsets of HDFS data on demand, using SQL queries from SAP Sybase IQ (data federation technique) Combine results of Hadoop MapReduce (MR) jobs with SAP Sybase IQ data on the fly: Initiate and join results of MR jobs on demand using SQL queries from data in SAP Sybase IQ (query federation technique)

1. 2. 3. 4.

Federation With external file systems (Hadoop distributed file system)

©  2012 SAP AG. All rights reserved. 26

•  TPFs (Table Parameterized Functions) consume/produce data sets in bulk •  TPFs run in parallel •  TPFs are fed with disjoint data sets •  TPFs can be arbitrarily nested to multiple levels via sub-queries •  TPFs currently available in popular, performance efficient C++

Storage Fabric

SELECT (Reducer… (Mapper… OVER PARTITION BY…) OVER PARTITION BY…)

Parallel mapper TPFs …

Parallel reducer TPFs

Native MapReduce Highly distributed processing without Hadoop

SAP Sybase IQ: Continuing Innovation

IMPORTANT LEGAL DISCLAIMER CONCERNING PROGRAM DATES, RELEASE-RELATED INFORMATION & CONTENT All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

©  2012 SAP AG. All rights reserved. 28

SAP Sybase IQ: Next Wave Innovations for extremely large databases (XLDB)

Storage Architecture • New generation column store • New partitioning and compression

Query Processing • Data affinity • Aggressively parallel and distributed

Loading Engine •  Fully parallel bulk loading • Real-time loading into delta store

System Reliability • Grid resiliency • Data availability

SAP Sybase IQ: Next Wave

Petabytes Real-time

Summary

©  2012 SAP AG. All rights reserved. 30

SAP SYBASE IQ A COMPREHENSIVE PLATFORM FOR BIG DATA ANALYTICS

Eco-System

App. Services

DBMS

Most  mature  column  store  

Comprehensive  lifecycle  9ering  

MPP  queries  +  Virtual  Marts  +  User  scaling  

High  Speed  loads  

Structured  +  Unstructured  Store  

Comprehensive  ANSI  SQL  w/OLAP  

Built-­‐in  Full  Text  Search  

InDB  Analy9cs  w/  MapReduce  +  simulator   Web  2.0  APIs   Big  Data  

OpnSrc    APIs  

Op9mized  BI,EIM,  Model,  Replicate   Dev  and  admin  tools   Predic9ve  Analy9cs     Packaged  ILM  apps  

Bradmark,  Symantec,  Whitesands,  Quest,  ZEND  

SAS,  SPSS,  KXEN,  Fuzzy  Logix,  

Zemen9s,  Visual  Numerics  

BMMSoZ,    SOLIX,  PBS    

Sybase  PowerDesigner,  Sybase  Replica9on  Server,  

SAP  BusinessObjects  ISYS,  Panop9con  

Hadoop,  R  

©  2012 SAP AG. All rights reserved. 31

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.

IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM Corporation.

Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.

Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.

Oracle and Java are registered trademarks of Oracle.

UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.

Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.

HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.

© 2012 SAP AG. All rights reserved.

SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company.

Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase, Inc. Sybase is an SAP company.

All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.

The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior written permission of SAP AG.

Twitter Tag: #briefr

The  Bloor  Group  

The Universe of Big Data

The  Bloor  Group  

In marketing terms BIG DATA

is as big a trend as cloud computing

(if you measure the trend in terms of column inches)

The  Bloor  Group  Twitter Tag: #briefr

q  Corporate data volumes grow at about 55% per annum

q  VLDB volumes grow at about 55% per annum

q  This is exponential q  Data has been growing

at this rate for at least 20 years

q  As such there is nothing new about big data other than the current data volumes which follow a well established trend

The Big Data Trend

The  Bloor  Group  

q  Volume, velocity, variety, verifiability and other words beginning with V - but not all at once

q  Hadoop is new

q  Big Data in the cloud is new

q  And there’s a new dynamic in data analytics

q  Volume (and velocity) is now mostly about events, not transactions, and the world of embedded processors is going to expand the number of events worth processing

So What’s New?

The  Bloor  Group  

The Analytics Two-Step

The  Bloor  Group  

q  The data growth trend is likely to continue

q  More and more companies will be drawn into using Big Data technologies

q  Will the two-step become a one-step? Not sure. If you gather Big Data, you also need to be able to throw it away

q  RDBMS (column store) will remain as the analytics engine

The Future?

The  Bloor  Group  

!  Please explain why you believe that the Sybase IQ ‘shared some things’ architecture is equal to or better than a shared nothing architecture?

!  Are you seeing the same trend that I seem to be noticing with Big Data in respect to analytics?

!  Roughly how many of your customers are using Hadoop?

!  If I were a Sybase customer would you recommend Hadoop as an ETL mechanism or is it your view that Sybase IQ can do it all?

The  Bloor  Group  

!  Please describe the most extensive use of Sybase IQ (in respect of data volumes, daily ingest, instances, etc.).

!  How difficult is it to use (in other words, what are the labor/DBA overheads compared to a traditional RDBMS)?

!  Are your competitors always the “usual suspects” (i.e. other column store products)? Do you ever compete with the NoSQL crowd?

!  Explain how you usually fit with HANA in sites where both products are in use. Is HANA promoting sales of Sybase IQ?

Twitter Tag: #briefr

Twitter Tag: #briefr

!  This Month: Database

!  November: Cloud

!  December: Innovators

!  January: Big Data

!  2013 Editorial Calendar (www.insideanalysis.com)

Twitter Tag: #briefr