Current Trends and Challenges in
Big Data Benchmarking Kai Sachs - SPEC Research Group
May 2014
© 2014 Kai Sachs. All rights reserved. 2
Hard- & Software Vendors:
Publish results & marketing
Example: 27.500 results submitted only for SPEC CPU2006 benchmarks
Developer:
Analysis & product quality
Example: Regression performance testing
Consumer:
Compare different products
Example: Find the best video card for gaming
IT Architect:
Cloud & hardware sizing
Example: Choosing configuration
Researcher:
Example: Evaluate own implementation using standardized workload
Benchmark Use Cases & Stakeholders
© 2014 Kai Sachs. All rights reserved. 3
Standard Performance Evaluation Corporation
OSG
Open
Systems
Group
HPG
High
Performance
Group
GWPG
Graphics and
Workstation
Performance
Group
RG
Research
Group
> 80 member organizations & associates
Founded 1988
© 2014 Kai Sachs. All rights reserved. 4
Standard Performance Evaluation Corporation
Development of Industry Standard Benchmarks
OSG
Open
Systems
Group
HPG
High
Performance
Group
GWPG
Graphics and
Workstation
Performance
Group
RG
Research
Group
> 80 member organizations & associates
Founded 1988
CPU, Java,
Virtualization,
Power, …
OpenMP, MPI
…
© 2014 Kai Sachs. All rights reserved. 5
RG
Research
Group
Cloud,
Intrusion
Detection
Systems,
Big Data
Standard Performance Evaluation Corporation
Research Platform
OSG
Open
Systems
Group
HPG
High
Performance
Group
GWPG
Graphics and
Workstation
Performance
Group
> 80 member organizations & associates
Founded 1988
© 2014 Kai Sachs. All rights reserved. 6
Provide a platform for collaborative research efforts in the areas of
Computer benchmarking and
Quantitative system analysis
Portal for all kinds of benchmarking-related resources
Provide research benchmarks, tools, metrics and scenarios.
Mission Statement
SPEC Research Group
© 2014 Kai Sachs. All rights reserved. 7
Performance
Performance in a broad sense:
Classical performance metrics
Example: response time, throughput, scalability,
efficiency, and elasticity
Non-functional system properties under the term
dependability
Example: availability, reliability, and security
© 2014 Kai Sachs. All rights reserved. 8
Big Data Benchmarking Community (BDBC)
‘Incubator’ for Big Data standard benchmark(s) for industry
>200 members on the mailing list
Workshop on Big Data Benchmarking Series
2012 in San Jose, CA & in Pune, India, 2013 in San Jose, CA & Xian, China,
2014 in Potsdam, Germany
Post-proceedings published in LNCS
BDBC is joining the SPEC Research Group
RG Working group focusing on Big Data in preparation
Working group chairs: Chaitan Baru, Tillmann Rabl
Towards a Big Data Standard Benchmark
WBDB 2012 Report: Setting the Direction for Big Data Benchmark Standards
C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, T. Rabl, TPCTC: 2012, collocated with VLDB2012
© 2014 Kai Sachs. All rights reserved. 9
Other Benchmark Organizations
Transaction Processing Performance Council (TPC)
Focus: Transaction Processing and Database Benchmarks
Most famous benchmarks: TPC-C (OLTP benchmark), TPC-E (OLTP
benchmark), TPC-H (Decision support benchmark)
Embedded Microprocessor Benchmark Consortium (EEMBC)
Focus: hardware and software used in embedded systems
Business Applications Performance Corporation (BAPCo)
Focus: performance benchmarks for personal computers based on
popular computer applications and industry standard operating systems
© 2014 Kai Sachs. All rights reserved. 10
General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP)
Local Arrangements: Matthias Uflacker (Hasso Plattner Institute)
Publicity Chair: Henning Schmitz (SAP Innovations Center)
Publication Chair: Meikel Poess (Oracle)
Program Committee
Milind Bhandarkar (Pivotal)
Anja Bog (SAP Labs)
Dhruba Borthakur (Facebook)
Joos-Hendrik Böse (Amazon)
Tobias Bürger (Payback)
Tyson Condi (UCLA)
Kshitij Doshi (Intel)
Pedro Furtado (U Coimbra)
Bhaskar Gowda (Intel)
Goetz Graefe (HP)
Martin Grund (Exascale)
Alfons Kemper (TU München)
Donald Kossmann (ETH Zürich)
Tim Kraska (Brown University)
Wolfgang Lehner (TU Dresden)
Christof Leng (UC Berkeley)
Stefan Manegold (CWI)
Raghu Nambiar (Cisco)
Manoj K. Nambiar (TCS)
Glenn Paulley (Conestoga Col.)
Keynote Speakers: Umesh Dayal, Alexandru Iosup
Scott Pearson (CLDS Industry Fellow)
Andreas Polze (HPI)
Alexander Reinefeld (HU Berlin)
Berni Schiefer (IBM Labs Toronto)
Saptak Sen (Hortonworks)
Florian Stegmaier (University of Passau)
Till Westmann (Oracle Labs)
Jianfeng Zhan (Chinese Academy of Science)
Platinum Sponsor: Gold Sponsors:
Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)
Benchmark Engineering
© 2014 Kai Sachs. All rights reserved. 12
Past & Present
Past:
It was common to write a for-loop and call it benchmark.
Present:
Benchmarks are complex pieces of software and
specifications.
Benchmark development has turned into a complex team
effort.
© 2014 Kai Sachs. All rights reserved. 13
The Whetstone Benchmark (1974 – 284 lines) Curnow, H.J., Wichman, B.A. "A Synthetic Benchmark" Computer Journal, Volume 19, Issue 1, Feb. 1976, p. 43-49
© 2014 Kai Sachs. All rights reserved. 14
SPEC CPU Benchmark Suite – Lines of Code Henning, J. ”SPEC CPU suite growth: an historical perspective” SIGARCH Comput. Archit. News 35, Issue 1, March 2007
© 2014 Kai Sachs. All rights reserved. 15
Example Components of a Standard Benchmark
Workload
Reporter
Run Rules
Implementation &
Framework (opt.)
Documentation
Metrics
BENCHMARK
Workload specification is the most important part Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark
Kai Sachs, Samuel Kounev, Jean Bacon, Alejandro Buchmann: Performance Evaluation, 2009
Performance Modeling and Benchmarking of Event-Based Systems
Kai Sachs, PhD Thesis, TU Darmstadt, 2010
© 2014 Kai Sachs. All rights reserved. 16
Workload Requirements
Resilience Benchmarking
Marco Vieira, Henrique Madeira, Kai Sachs, Samuel Kounev in Resilience Assessment and Evaluation, Springer, 2012
Representativeness
Comprehensiveness
Focus
Scalability
Configurability
© 2014 Kai Sachs. All rights reserved. 17
Workload Description ‘Level’
From TPC-C to Big Data Benchmarks: A Functional Workload Model
Yanpei Chen, Francois Raab, and Randy Katz in Workshop on Big Data Benchmarks, 2012.
Current Trends &
Challenges in Big Data
Benchmarking
© 2014 Kai Sachs. All rights reserved. 19
Current Trends & Challenges in Benchmarking
Technology:
Virtualization
Cloud
(Big) Data
Map Reduce, Mixed Workload (OLAP / OLTP),
Data / Event Streaming, …
Benchmarking methodology:
Large Scale Systems
Tools:
Data / workload generator
Power consumption
Simulation frameworks
Generic benchmarking frameworks
Technologies
Tools Benchmark
Methodologies
© 2014 Kai Sachs. All rights reserved. 20
Current Trends & Challenges in Benchmarking
Technology:
Virtualization
Cloud
(Big) Data
Map Reduce, Mixed Workload (OLAP / OLTP),
Data / Event Streaming, …
Benchmarking methodology:
Large Scale Systems
Tools:
Data / workload generator
Power consumption
Simulation frameworks
Generic benchmarking frameworks
Technologies
Tools Benchmark
Methodologies
© 2014 Kai Sachs. All rights reserved. 21
Benchmark Methodology
System Under Test
Past & Present
Single node
Multiple nodes
Isolated systems
© 2014 Kai Sachs. All rights reserved. 22
Benchmark Methodology
System Under Test
http://instagram.com/p/W2FCksR9-e/
St. Peter's Square
2005 vs. 2013
© 2014 Kai Sachs. All rights reserved. 23
Benchmark Methodology
System Under Test
Challenge: Large Scale Systems
Isolation is not guaranteed (or impossible)
High number of nodes
Data amount is very high
Repeatability is an issue
How can we benchmark such systems?
Technologies
Tools Benchmark
Methodology
© 2014 Kai Sachs. All rights reserved. 24
“Big Data should be Interesting Data!
There are various definitions of Big Data; most center around a number of
V’s like volume, velocity, variety, veracity – in short: interesting data
(interesting in at least one aspect). However, when you look into research
papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see
here in experimental studies is utterly boring. Performance and scalability
experiments are often based on the TPC-H benchmark: completely
synthetic data with a synthetic workload that has been beaten to death for
the last twenty years. Data quality, data cleaning, and data integration
studies are often based on bibliographic data from DBLP, usually old
versions with less than a million publications, prolific authors, and curated
records. I doubt that this is a real challenge for tasks like entity linkage or
data cleaning. So where’s the – interesting – data in Big Data research?”
Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013
Gerhard Weikum
© 2014 Kai Sachs. All rights reserved. 25
“Big Data should be Interesting Data!
There are various definitions of Big Data; most center around a number of
V’s like volume, velocity, variety, veracity – in short: interesting data
(interesting in at least one aspect). However, when you look into research
papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see
here in experimental studies is utterly boring. Performance and scalability
experiments are often based on the TPC-H benchmark: completely
synthetic data with a synthetic workload that has been beaten to
death for the last twenty years. Data quality, data cleaning, and data
integration studies are often based on bibliographic data from DBLP,
usually old versions with less than a million publications, prolific authors,
and curated records. I doubt that this is a real challenge for tasks like entity
linkage or data cleaning. So where’s the – interesting – data in Big Data
research?”
Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013
Gerhard Weikum
© 2014 Kai Sachs. All rights reserved. 26
Big Data Benchmark:
Issues and Challenges
‘Big Data World’
Communities
Benchmark Design
Single benchmark vs. Benchmark collection
Component vs. End-to-end scenario
Specification vs. Implementation
Metric
System under Test
Workload
© 2014 Kai Sachs. All rights reserved. 27
Enterprise Warehouse + Agglomeration of other data
Structured enterprise data warehouse
Extended to incorporate data from other non-fully structured
data sources (e.g. weblogs, text, streams)
Pool of data with sequence of processing
Enterprise data processing as a pipeline from data ingestion
to transformation, extraction, subsetting, machine learning,
predictive analytics
Data from multiple structured and non-structured sources
Abstractions of the Big Data World from WBDB
Introduction to the 4th Workshop on Big Data Benchmarking
Chaitan Baru
© 2014 Kai Sachs. All rights reserved. 28
Scenario:
Retail domain
Data:
Structured: based on TPC–DS
Semi-Structured: click streams
Unstructured: product reviews
PDGF used to generate data
BigBench: A Big Data Analytics Benchmark
Data Model
BigBench: Towards an Industry Standard Benchmark for Big Data Analytics
A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013
© 2014 Kai Sachs. All rights reserved. 29
Extended version of parallel data generation framework (PDGF)
Separate review generator
BigBench: A Big Data Analytics Benchmark
Data Generation – Unstructured Data
BigBench: Towards an Industry Standard Benchmark for Big Data Analytics
A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013, to appear
© 2014 Kai Sachs. All rights reserved. 30
An end-to-end data processing pipeline:
Data from multiple sources
Loose, flexible schema
Data requires structuring
Application characteristics
Processing pipelines
Running models with data
Deep Analytics Pipeline
Introduction to the 4th Workshop on Big Data Benchmarking
Chaitan Baru
© 2014 Kai Sachs. All rights reserved. 31
Example of an Application:
Determine User Interest Profile by Mining Activities
Scalable distributed inference of dynamic user interests for behavioral targeting
A. Ahmed, Y. Low, M. Aly, V. Josifovski, A.J. Smola, SIGKDD 2011
© 2014 Kai Sachs. All rights reserved. 32
Composite Benchmark for Transactions and Reporting (CBTR)
OLTP & OLAP Benchmark based on Current and Real Enterprise
Order-to-cash Scenario: 18 tables with 5 - 327 columns 2316 columns in sum
Variable Workload Mix
OLTP sub-workload ST:= {x ∈ ℜ | 0 ≤ x ≤ 1}
OLAP sub-workload SA = 1 - ST
read-only OLTP queries SrT:= {x ∈ ℜ | 0 ≤ x ≤ 1}
mixed OLTP queries SmT = 1 - SrT
S: share T: transactional | A: analytical r: read-only | m: mixed
Benchmarking Composite Transaction and Analytical Processing Systems
Anja Bog, PhD Thesis, University of Potsdam, 2012
Interactive Performance Monitoring of a Composite OLTP & OLAP Workload
Anja Bog, Kai Sachs, Hasso Plattner. SIGMOD 2012 (Demo)
Normalization in a Mixed OLTP and OLAP Workload Scenario
Anja Bog, Kai Sachs, Alexander Zeier, Hasso Plattner. TPCTC 2011, collocated with VLDB2011
© 2014 Kai Sachs. All rights reserved. 33
Big Data & Cloud Benchmark
Related Work – Virtualization Benchmarking
© 2014 Kai Sachs. All rights reserved. 34
Big Data & Cloud Benchmark
Related Work – Virtualization Benchmarking
© 2014 Kai Sachs. All rights reserved. 35
Other activities
TPC–BD
TPC announced a Big Data working group (11.2013)
Graph 500
Driven by HPC community
Cooperating with SPEC CPU group
Green Graph 500 list
SPEC OSG
Big Data as part of a cloud benchmark
Cloudsuite 2.0, CH-benCHmark, BigDataBench, HiBench,
LinkedBench …
© 2014 Kai Sachs. All rights reserved. 36
Target group
Researchers & developers
Data categories
Structured, unstructured and semi-structured; events & streams; graphs;
geospatial, retail, astronomy & genomic; …
Benchmark scenario & metrics
Realistic use-cases & workload mixes
Big Data Classification schema
(Research) Standard Benchmarks
BigBench, Deep Analytics Pipeline, …
Data generation
Real world traces & synthetic data, tooling
SPEC RG – Big Data Working Group
Potential Topics
Conclusions
© 2014 Kai Sachs. All rights reserved. 38
Conclusions
Benchmarking is more than throughput
Meaningful workloads are most important
© 2014 Kai Sachs. All rights reserved. 39
Conclusions
Benchmarking is more than throughput
Meaningful workloads are most important
More research is needed
Benchmarking of large scale systems
“Big Data World”: Workloads & scenarios
Benchmarks for Big Data
We Don’t Know Enough to make a Big Data
Benchmark Suite
Yanpei Chen, WBDB 2012
Thank you
Contact information:
Kai Sachs
Email: [email protected]
Disclaimer:
SPEC, the SPEC logo, the SPEC Research Group logo and the tool and names SERT, SPECjms2007, SPECpower_ssj2008, SPECweb2009 and
SPECvirt_sc2010 are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). Reprint with permission.
© 2014 Kai Sachs. All rights reserved. 41
General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP)
Local Arrangements: Matthias Uflacker (Hasso Plattner Institute)
Publicity Chair: Henning Schmitz (SAP Innovations Center)
Publication Chair: Meikel Poess (Oracle)
Program Committee
Milind Bhandarkar (Pivotal)
Anja Bog (SAP Labs)
Dhruba Borthakur (Facebook)
Joos-Hendrik Böse (Amazon)
Tobias Bürger (Payback)
Tyson Condi (UCLA)
Kshitij Doshi (Intel)
Pedro Furtado (U Coimbra)
Bhaskar Gowda (Intel)
Goetz Graefe (HP)
Martin Grund (Exascale)
Alfons Kemper (TU München)
Donald Kossmann (ETH Zürich)
Tim Kraska (Brown University)
Wolfgang Lehner (TU Dresden)
Christof Leng (UC Berkeley)
Stefan Manegold (CWI)
Raghu Nambiar (Cisco)
Manoj K. Nambiar (TCS)
Glenn Paulley (Conestoga Col.)
Keynote Speakers: Umesh Dayal, Alexandru Iosup
Scott Pearson (CLDS Industry Fellow)
Andreas Polze (HPI)
Alexander Reinefeld (HU Berlin)
Berni Schiefer (IBM Labs Toronto)
Saptak Sen (Hortonworks)
Florian Stegmaier (University of Passau)
Till Westmann (Oracle Labs)
Jianfeng Zhan (Chinese Academy of Science)
Platinum Sponsor: Gold Sponsors:
Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)
Top Related