The New Possible: Very Big Data for Serious Business Value
-
Upload
inside-analysis -
Category
Technology
-
view
979 -
download
1
Transcript of The New Possible: Very Big Data for Serious Business Value
Twitter Tag: #briefr
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Twitter Tag: #briefr
! November: Cloud
! December: Innovators
! January: Big Data
! February: Performance
! March: Integration
Twitter Tag: #briefr
! Databases were designed primarily to store information for retrieval at a later time.
! Big Data requires big databases. ! The convergence of multi-structured data and the need to
perform both transactional and operational analytics has led to substantial innovations in database technologies.
! Today some of the biggest databases blend the best of both worlds, transforming the way organizations store and analyze enterprise data.
Twitter Tag: #briefr
! German-founded SAP is one of the largest software companies in the world. Its best-known products are SAP ERP, SAP Business Warehouse, SAP Business Objects, SAP Sybase IQ and SAP HANA.
! SAP offers a comprehensive set of database management solutions that spans the needs of the enterprise, leveraging in-memory, cloud and mobile technologies.
! Recent innovations include a Big Data analytics platform that loads, processes and delivers massive amounts of multi-structured data and is accessible on demand enterprise-wide.
Twitter Tag: #briefr
Courtney Claussen is a product manager at Sybase, Inc., concentrating on Sybase's data warehousing and analytics products.
She has enjoyed a 30 year career in software development, technical support
and product marketing in the areas of computer aided design, computer aided
software engineering, database management systems, middleware, and
analytics.
CONFIDENTIAL
The New Possible: Very Big Data for Serious Business Value The Briefing Room with Dr. Robin Bloor and SAP
October 9, 2012
© 2012 SAP AG. All rights reserved. 10
AGENDA
• Big Data Analytics: A Reality • SAP Sybase IQ: Built for Big Data Analytics • SAP Sybase IQ: Continuing Innovation
© 2012 SAP AG. All rights reserved. 12
Operational Efficiencies
Revenue Growth
New Strategies & Business Models
*A McKinsey study titled “Big Data: Next frontier for innovation, competition, and productivity”, May 2011, has found huge potential for Big Data Analytics with metrics as impressive as 60% improvements in Retail operating margins, 8% reduction in (US) national healthcare expenditures, and $150M savings in operational efficiencies in European economies
Business Value*
THE NEW DYNAMICS OF BUSINESS COMPETING ON BIG DATA DRIVEN ANALYTICS
© 2012 SAP AG. All rights reserved. 13
Getting Value from Big Data
Find supply chain inefficiencies
Uncover insurance fraud
Dispense correct health care
Predict financial performance
Optimize stocking of products
Maintain customer loyalty
Applied Big Data Analytics
© 2012 SAP AG. All rights reserved. 14
Business Vale*
EDW AND BIG DATA PLATFORMS CONTRASTS
Big Data
EDW
Large -> ENORMOUS
SQL -> Programmatic Enterprise data,
relational, structured,
indexed text
Clickstreams, sensors, log data,
unstructured social media
Scale up -> Scale out
Schema -> No schema
OLAP -> Batch processing
Pre-processed data -> Raw data
© 2012 SAP AG. All rights reserved. 15
Business Vale*
Big Data
EDW
Enterprise data, relational, structured,
indexed text
Clickstreams, sensors, log data, unstructured
social media
• Combine all relevant data for better insights
• Real-time BI • SQL declarative processing • Big Data pre-processing
with EDW deep analytics
EDW AND BIG DATA PLATFORMS PARTNERSHIP
© 2012 SAP AG. All rights reserved. 16
Big-data analytics plus data warehousing Deserves a new platform
Support massive numbers of users and workloads
Analyze massive volumes of complex data from many sources
� Platform accessible to all business processes and all business users
� Requirement for data and algorithms together in the platform
� Ability to distribute interactions throughout the enterprise
Specialized apps
Web Mobile
Integrated workflow
Operational reporting
OLAP*
Data mining
Data loading
In-DB analytics
HDFS
� MapReduce � RDBMS+
� EDW
� Volume � Velocity � Variety � Costs � Skills
*Online analytical processing +Relational database management
system
© 2012 SAP AG. All rights reserved. 18
Multi-dimensional scale out • Multiple resources can scale out independently
– Storage, server (CPU, memory), SAN switches, interconnect can scale on their own • Scale out is incremental and linear
– No need to add large units of monolithic CPU/storage pairs
Grid architecture System scale out
Storage Fabric
Full Mesh Interconnect
© 2012 SAP AG. All rights reserved. 19
Deployed use case comScore Networks measures the digital world
� comScore provides solutions for online audience measurement, e-commerce, advertising, search, video and mobile to analysts with digital marketing and vertical-specific industry expertise
� Large SAP Sybase IQ Multiplex Grid on v15.x with 10s of servers and hundreds of CPU cores
� Manages more than 150TB of data with trillions of rows and 10s of thousands of tables � More than 200+ concurrent users with highly parallel and distributed workload � Incrementally scalable on commodity hardware
Storage Fabric
……………
© 2012 SAP AG. All rights reserved. 20
Virtual data marts • VDM is logical binding of mutually exclusive nodes, memory, storage
– Logical Server (LS) is a mutually exclusive logical binding of nodes, memory – Logical Server (LS) is a subset of VDM – Bindings are elastic i.e. they can dynamically grow/shrink
Community platform Elastic virtual data marts
VDM1 VDM2 Shared
Virtual Shared Storage
Virtual Shared CPU, Memory
Full Mesh High Speed Interconnect
Storage Fabric
Logical Server 1 Logical Server 2
© 2012 SAP AG. All rights reserved. 21
Robust load engine
Loading can be from multiple modes:
� Parallel bulk load processing:
– Load rates in excess of 250 GB/hr are common even with modest-size hardware nodes
� Continuous and trickle feed via microbatching (change data capture)
Page-level snapshot versioning:
� Allows non-blocking concurrent loads and queries
Load from client machines
Full-‐mesh interconnect
Extraction, transformation, and load (ETL) in SAP software
Scale out Scale out ETL project 1 ETL project 1 ETL project 1
Storage fabric
© 2012 SAP AG. All rights reserved. 22
Massively parallel processing • Leader node: Receives and initiates queries
– Any node can be a leader – Leader node may satisfy query within itself
• Worker node: Nodes pick up work units from leader – Many worker nodes per query – Same worker node can serve multiple queries
Query engine Distributed query processing
Query 2 4 node DQP
Query 1 5 node DQP
Storage Fabric
© 2012 SAP AG. All rights reserved. 23
Text search and analysis
Text index
ID Term Pos Info
0 a 1,3,4
1 b 1,5
2 c 1
3 d 2,3,4
4 e 2,4,5
5 f 2,5
TextCol
a b c
f e e d
d a d
d e a d
b e e f
…
…
Table in SAP Sybase IQ
Full-text queries: SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘d’); – returns rows SELECT * FROM myTable CONTAINS (TextCol, ‘d’); – returns rows and scoring SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘a AND NOT b’); – Boolean SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘a NEAR b’); – proximity
Full text queries
Visualization
Text load
File ingestion into blob or clob
Entity extraction
Categorization tokenization
Schema transform
Hierarchical to relational
Text filtering
Filtering to plain text and formatting ?
© 2012 SAP AG. All rights reserved. 24
Analytics simplified: Logic to data = fast and efficient
In-database analytics
No compromise for complex analytics: � Basic to advanced analytical functions available to SQL � Data never leaves the database until results are materialized � Analytics code and models are shareable � Analytics code and models are applicable to the latest data set � Average developer can build in database analytical models
Process in SAP Sybase IQ
Built-in functions External DLL “A”
External DLL “A”
Database = logic and filtering
applied in database
© 2012 SAP AG. All rights reserved. 25
Client-side federation: Join data from SAP Sybase IQ and Hadoop at a client-application level
Load Hadoop data into column store of SAP Sybase IQ: Extract, transform, and load data from Hadoop distributed file system (HDFS) into schemas of SAP Sybase IQ
ETL
Join HDFS data with data of SAP Sybase IQ on the fly: Fetch and join subsets of HDFS data on demand, using SQL queries from SAP Sybase IQ (data federation technique) Combine results of Hadoop MapReduce (MR) jobs with SAP Sybase IQ data on the fly: Initiate and join results of MR jobs on demand using SQL queries from data in SAP Sybase IQ (query federation technique)
1. 2. 3. 4.
Federation With external file systems (Hadoop distributed file system)
© 2012 SAP AG. All rights reserved. 26
• TPFs (Table Parameterized Functions) consume/produce data sets in bulk • TPFs run in parallel • TPFs are fed with disjoint data sets • TPFs can be arbitrarily nested to multiple levels via sub-queries • TPFs currently available in popular, performance efficient C++
Storage Fabric
SELECT (Reducer… (Mapper… OVER PARTITION BY…) OVER PARTITION BY…)
…
Parallel mapper TPFs …
Parallel reducer TPFs
Native MapReduce Highly distributed processing without Hadoop
SAP Sybase IQ: Continuing Innovation
IMPORTANT LEGAL DISCLAIMER CONCERNING PROGRAM DATES, RELEASE-RELATED INFORMATION & CONTENT All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
© 2012 SAP AG. All rights reserved. 28
SAP Sybase IQ: Next Wave Innovations for extremely large databases (XLDB)
Storage Architecture • New generation column store • New partitioning and compression
Query Processing • Data affinity • Aggressively parallel and distributed
Loading Engine • Fully parallel bulk loading • Real-time loading into delta store
System Reliability • Grid resiliency • Data availability
SAP Sybase IQ: Next Wave
Petabytes Real-time
© 2012 SAP AG. All rights reserved. 30
SAP SYBASE IQ A COMPREHENSIVE PLATFORM FOR BIG DATA ANALYTICS
Eco-System
App. Services
DBMS
Most mature column store
Comprehensive lifecycle 9ering
MPP queries + Virtual Marts + User scaling
High Speed loads
Structured + Unstructured Store
Comprehensive ANSI SQL w/OLAP
Built-‐in Full Text Search
InDB Analy9cs w/ MapReduce + simulator Web 2.0 APIs Big Data
OpnSrc APIs
Op9mized BI,EIM, Model, Replicate Dev and admin tools Predic9ve Analy9cs Packaged ILM apps
Bradmark, Symantec, Whitesands, Quest, ZEND
SAS, SPSS, KXEN, Fuzzy Logix,
Zemen9s, Visual Numerics
BMMSoZ, SOLIX, PBS
Sybase PowerDesigner, Sybase Replica9on Server,
SAP BusinessObjects ISYS, Panop9con
Hadoop, R
© 2012 SAP AG. All rights reserved. 31
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.
IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM Corporation.
Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.
Oracle and Java are registered trademarks of Oracle.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.
HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.
© 2012 SAP AG. All rights reserved.
SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, SAP HANA, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company.
Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase, Inc. Sybase is an SAP company.
All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.
The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior written permission of SAP AG.
The Bloor Group
In marketing terms BIG DATA
is as big a trend as cloud computing
(if you measure the trend in terms of column inches)
The Bloor Group Twitter Tag: #briefr
q Corporate data volumes grow at about 55% per annum
q VLDB volumes grow at about 55% per annum
q This is exponential q Data has been growing
at this rate for at least 20 years
q As such there is nothing new about big data other than the current data volumes which follow a well established trend
The Big Data Trend
The Bloor Group
q Volume, velocity, variety, verifiability and other words beginning with V - but not all at once
q Hadoop is new
q Big Data in the cloud is new
q And there’s a new dynamic in data analytics
q Volume (and velocity) is now mostly about events, not transactions, and the world of embedded processors is going to expand the number of events worth processing
So What’s New?
The Bloor Group
q The data growth trend is likely to continue
q More and more companies will be drawn into using Big Data technologies
q Will the two-step become a one-step? Not sure. If you gather Big Data, you also need to be able to throw it away
q RDBMS (column store) will remain as the analytics engine
The Future?
The Bloor Group
! Please explain why you believe that the Sybase IQ ‘shared some things’ architecture is equal to or better than a shared nothing architecture?
! Are you seeing the same trend that I seem to be noticing with Big Data in respect to analytics?
! Roughly how many of your customers are using Hadoop?
! If I were a Sybase customer would you recommend Hadoop as an ETL mechanism or is it your view that Sybase IQ can do it all?
The Bloor Group
! Please describe the most extensive use of Sybase IQ (in respect of data volumes, daily ingest, instances, etc.).
! How difficult is it to use (in other words, what are the labor/DBA overheads compared to a traditional RDBMS)?
! Are your competitors always the “usual suspects” (i.e. other column store products)? Do you ever compete with the NoSQL crowd?
! Explain how you usually fit with HANA in sites where both products are in use. Is HANA promoting sales of Sybase IQ?
Twitter Tag: #briefr
! This Month: Database
! November: Cloud
! December: Innovators
! January: Big Data
! 2013 Editorial Calendar (www.insideanalysis.com)