Download - Current Trends and Challenges in Big Data Benchmarking

Current Trends and Challenges in

Big Data Benchmarking Kai Sachs - SPEC Research Group

May 2014

© 2014 Kai Sachs. All rights reserved. 2

Hard- & Software Vendors:

Publish results & marketing

Example: 27.500 results submitted only for SPEC CPU2006 benchmarks

Developer:

Analysis & product quality

Example: Regression performance testing

Consumer:

Compare different products

Example: Find the best video card for gaming

IT Architect:

Cloud & hardware sizing

Example: Choosing configuration

Researcher:

Example: Evaluate own implementation using standardized workload

Benchmark Use Cases & Stakeholders


Standard Performance Evaluation Corporation

OSG

Open

Systems

Group

HPG

High

Performance

Group

GWPG

Graphics and

Workstation

Performance

Group

RG

Research

Group

> 80 member organizations & associates

Founded 1988



Development of Industry Standard Benchmarks

OSG

Open

Systems

Group

HPG

High

Performance

Group

GWPG

Graphics and

Workstation

Performance

Group

RG

Research

Group


Founded 1988

CPU, Java,

Virtualization,

Power, …

OpenMP, MPI

…


RG

Research

Group

Cloud,

Intrusion

Detection

Systems,

Big Data


Research Platform

OSG

Open

Systems

Group

HPG

High

Performance

Group

GWPG

Graphics and

Workstation

Performance

Group


Founded 1988


Provide a platform for collaborative research efforts in the areas of

Computer benchmarking and

Quantitative system analysis

Portal for all kinds of benchmarking-related resources

Provide research benchmarks, tools, metrics and scenarios.

Mission Statement

SPEC Research Group


Performance

Performance in a broad sense:

Classical performance metrics

Example: response time, throughput, scalability,

efficiency, and elasticity

Non-functional system properties under the term

dependability

Example: availability, reliability, and security


Big Data Benchmarking Community (BDBC)

‘Incubator’ for Big Data standard benchmark(s) for industry

>200 members on the mailing list

Workshop on Big Data Benchmarking Series

2012 in San Jose, CA & in Pune, India, 2013 in San Jose, CA & Xian, China,

2014 in Potsdam, Germany

Post-proceedings published in LNCS

BDBC is joining the SPEC Research Group

RG Working group focusing on Big Data in preparation

Working group chairs: Chaitan Baru, Tillmann Rabl

Towards a Big Data Standard Benchmark

WBDB 2012 Report: Setting the Direction for Big Data Benchmark Standards

C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, T. Rabl, TPCTC: 2012, collocated with VLDB2012


Other Benchmark Organizations

Transaction Processing Performance Council (TPC)

Focus: Transaction Processing and Database Benchmarks

Most famous benchmarks: TPC-C (OLTP benchmark), TPC-E (OLTP

benchmark), TPC-H (Decision support benchmark)

Embedded Microprocessor Benchmark Consortium (EEMBC)

Focus: hardware and software used in embedded systems

Business Applications Performance Corporation (BAPCo)

Focus: performance benchmarks for personal computers based on

popular computer applications and industry standard operating systems


General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP)

Local Arrangements: Matthias Uflacker (Hasso Plattner Institute)

Publicity Chair: Henning Schmitz (SAP Innovations Center)

Publication Chair: Meikel Poess (Oracle)

Program Committee

Milind Bhandarkar (Pivotal)

Anja Bog (SAP Labs)

Dhruba Borthakur (Facebook)

Joos-Hendrik Böse (Amazon)

Tobias Bürger (Payback)

Tyson Condi (UCLA)

Kshitij Doshi (Intel)

Pedro Furtado (U Coimbra)

Bhaskar Gowda (Intel)

Goetz Graefe (HP)

Martin Grund (Exascale)

Alfons Kemper (TU München)

Donald Kossmann (ETH Zürich)

Tim Kraska (Brown University)

Wolfgang Lehner (TU Dresden)

Christof Leng (UC Berkeley)

Stefan Manegold (CWI)

Raghu Nambiar (Cisco)

Manoj K. Nambiar (TCS)

Glenn Paulley (Conestoga Col.)

Keynote Speakers: Umesh Dayal, Alexandru Iosup

Scott Pearson (CLDS Industry Fellow)

Andreas Polze (HPI)

Alexander Reinefeld (HU Berlin)

Berni Schiefer (IBM Labs Toronto)

Saptak Sen (Hortonworks)

Florian Stegmaier (University of Passau)

Till Westmann (Oracle Labs)

Jianfeng Zhan (Chinese Academy of Science)

Platinum Sponsor: Gold Sponsors:

Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)

Benchmark Engineering


Past & Present

Past:

It was common to write a for-loop and call it benchmark.

Present:

Benchmarks are complex pieces of software and

specifications.

Benchmark development has turned into a complex team

effort.


The Whetstone Benchmark (1974 – 284 lines) Curnow, H.J., Wichman, B.A. "A Synthetic Benchmark" Computer Journal, Volume 19, Issue 1, Feb. 1976, p. 43-49


SPEC CPU Benchmark Suite – Lines of Code Henning, J. ”SPEC CPU suite growth: an historical perspective” SIGARCH Comput. Archit. News 35, Issue 1, March 2007


Example Components of a Standard Benchmark

Workload

Reporter

Run Rules

Implementation &

Framework (opt.)

Documentation

Metrics

BENCHMARK

Workload specification is the most important part Performance evaluation of message-oriented middleware using the SPECjms2007 benchmark

Kai Sachs, Samuel Kounev, Jean Bacon, Alejandro Buchmann: Performance Evaluation, 2009

Performance Modeling and Benchmarking of Event-Based Systems

Kai Sachs, PhD Thesis, TU Darmstadt, 2010


Workload Requirements

Resilience Benchmarking

Marco Vieira, Henrique Madeira, Kai Sachs, Samuel Kounev in Resilience Assessment and Evaluation, Springer, 2012

Representativeness

Comprehensiveness

Focus

Scalability

Configurability


Workload Description ‘Level’

From TPC-C to Big Data Benchmarks: A Functional Workload Model

Yanpei Chen, Francois Raab, and Randy Katz in Workshop on Big Data Benchmarks, 2012.

Current Trends &

Challenges in Big Data

Benchmarking


Current Trends & Challenges in Benchmarking

Technology:

Virtualization

Cloud

(Big) Data

Map Reduce, Mixed Workload (OLAP / OLTP),

Data / Event Streaming, …

Benchmarking methodology:

Large Scale Systems

Tools:

Data / workload generator

Power consumption

Simulation frameworks

Generic benchmarking frameworks

Technologies

Tools Benchmark

Methodologies


Benchmark Methodology

System Under Test

Past & Present

Single node

Multiple nodes

Isolated systems



System Under Test

http://instagram.com/p/W2FCksR9-e/

St. Peter's Square

2005 vs. 2013



System Under Test

Challenge: Large Scale Systems

Isolation is not guaranteed (or impossible)

High number of nodes

Data amount is very high

Repeatability is an issue

How can we benchmark such systems?

Technologies

Tools Benchmark

Methodology


“Big Data should be Interesting Data!

There are various definitions of Big Data; most center around a number of

V’s like volume, velocity, variety, veracity – in short: interesting data

(interesting in at least one aspect). However, when you look into research

papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see

here in experimental studies is utterly boring. Performance and scalability

experiments are often based on the TPC-H benchmark: completely

synthetic data with a synthetic workload that has been beaten to death for

the last twenty years. Data quality, data cleaning, and data integration

studies are often based on bibliographic data from DBLP, usually old

versions with less than a million publications, prolific authors, and curated

records. I doubt that this is a real challenge for tasks like entity linkage or

data cleaning. So where’s the – interesting – data in Big Data research?”

Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013

Gerhard Weikum


“Big Data should be Interesting Data!

There are various definitions of Big Data; most center around a number of

V’s like volume, velocity, variety, veracity – in short: interesting data

(interesting in at least one aspect). However, when you look into research

papers on Big Data, in SIGMOD, VLDB, or ICDE, the data that you see

here in experimental studies is utterly boring. Performance and scalability

experiments are often based on the TPC-H benchmark: completely

synthetic data with a synthetic workload that has been beaten to

death for the last twenty years. Data quality, data cleaning, and data

integration studies are often based on bibliographic data from DBLP,

usually old versions with less than a million publications, prolific authors,

and curated records. I doubt that this is a real challenge for tasks like entity

linkage or data cleaning. So where’s the – interesting – data in Big Data

research?”

Where’s the Data in the Big Data Wave? – SIGMOD Blog March 2013

Gerhard Weikum


Big Data Benchmark:

Issues and Challenges

‘Big Data World’

Communities

Benchmark Design

Single benchmark vs. Benchmark collection

Component vs. End-to-end scenario

Specification vs. Implementation

Metric

System under Test

Workload


Enterprise Warehouse + Agglomeration of other data

Structured enterprise data warehouse

Extended to incorporate data from other non-fully structured

data sources (e.g. weblogs, text, streams)

Pool of data with sequence of processing

Enterprise data processing as a pipeline from data ingestion

to transformation, extraction, subsetting, machine learning,

predictive analytics

Data from multiple structured and non-structured sources

Abstractions of the Big Data World from WBDB

Introduction to the 4th Workshop on Big Data Benchmarking

Chaitan Baru


Scenario:

Retail domain

Data:

Structured: based on TPC–DS

Semi-Structured: click streams

Unstructured: product reviews

PDGF used to generate data

BigBench: A Big Data Analytics Benchmark

Data Model

BigBench: Towards an Industry Standard Benchmark for Big Data Analytics

A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013


Extended version of parallel data generation framework (PDGF)

Separate review generator

BigBench: A Big Data Analytics Benchmark

Data Generation – Unstructured Data

BigBench: Towards an Industry Standard Benchmark for Big Data Analytics

A. Ghazal, Minqing Hu, T. Rabl, F. Raab, M. Poess, A. Crolotte, H. Jacobsen. SIGMOD 2013, to appear


An end-to-end data processing pipeline:

Data from multiple sources

Loose, flexible schema

Data requires structuring

Application characteristics

Processing pipelines

Running models with data

Deep Analytics Pipeline

Introduction to the 4th Workshop on Big Data Benchmarking

Chaitan Baru


Example of an Application:

Determine User Interest Profile by Mining Activities

Scalable distributed inference of dynamic user interests for behavioral targeting

A. Ahmed, Y. Low, M. Aly, V. Josifovski, A.J. Smola, SIGKDD 2011


Composite Benchmark for Transactions and Reporting (CBTR)

OLTP & OLAP Benchmark based on Current and Real Enterprise

Order-to-cash Scenario: 18 tables with 5 - 327 columns 2316 columns in sum

Variable Workload Mix

OLTP sub-workload ST:= {x ∈ ℜ | 0 ≤ x ≤ 1}

OLAP sub-workload SA = 1 - ST

read-only OLTP queries SrT:= {x ∈ ℜ | 0 ≤ x ≤ 1}

mixed OLTP queries SmT = 1 - SrT

S: share T: transactional | A: analytical r: read-only | m: mixed

Benchmarking Composite Transaction and Analytical Processing Systems

Anja Bog, PhD Thesis, University of Potsdam, 2012

Interactive Performance Monitoring of a Composite OLTP & OLAP Workload

Anja Bog, Kai Sachs, Hasso Plattner. SIGMOD 2012 (Demo)

Normalization in a Mixed OLTP and OLAP Workload Scenario

Anja Bog, Kai Sachs, Alexander Zeier, Hasso Plattner. TPCTC 2011, collocated with VLDB2011


Big Data & Cloud Benchmark

Related Work – Virtualization Benchmarking


Other activities

TPC–BD

TPC announced a Big Data working group (11.2013)

Graph 500

Driven by HPC community

Cooperating with SPEC CPU group

Green Graph 500 list

SPEC OSG

Big Data as part of a cloud benchmark

Cloudsuite 2.0, CH-benCHmark, BigDataBench, HiBench,

LinkedBench …


Target group

Researchers & developers

Data categories

Structured, unstructured and semi-structured; events & streams; graphs;

geospatial, retail, astronomy & genomic; …

Benchmark scenario & metrics

Realistic use-cases & workload mixes

Big Data Classification schema

(Research) Standard Benchmarks

BigBench, Deep Analytics Pipeline, …

Data generation

Real world traces & synthetic data, tooling

SPEC RG – Big Data Working Group

Potential Topics

Conclusions


Conclusions

Benchmarking is more than throughput

Meaningful workloads are most important


Conclusions

Benchmarking is more than throughput

Meaningful workloads are most important

More research is needed

Benchmarking of large scale systems

“Big Data World”: Workloads & scenarios

Benchmarks for Big Data

We Don’t Know Enough to make a Big Data

Benchmark Suite

Yanpei Chen, WBDB 2012

Thank you

Contact information:

Kai Sachs

Email: [email protected]

Disclaimer:

SPEC, the SPEC logo, the SPEC Research Group logo and the tool and names SERT, SPECjms2007, SPECpower_ssj2008, SPECweb2009 and

SPECvirt_sc2010 are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). Reprint with permission.


General Chairs: Chaitan Baru (UC San Diego), Tilmann Rabl (U Toronto), Kai Sachs (SAP)

Local Arrangements: Matthias Uflacker (Hasso Plattner Institute)

Publicity Chair: Henning Schmitz (SAP Innovations Center)

Publication Chair: Meikel Poess (Oracle)

Program Committee

Milind Bhandarkar (Pivotal)

Anja Bog (SAP Labs)

Dhruba Borthakur (Facebook)

Joos-Hendrik Böse (Amazon)

Tobias Bürger (Payback)

Tyson Condi (UCLA)

Kshitij Doshi (Intel)

Pedro Furtado (U Coimbra)

Bhaskar Gowda (Intel)

Goetz Graefe (HP)

Martin Grund (Exascale)

Alfons Kemper (TU München)

Donald Kossmann (ETH Zürich)

Tim Kraska (Brown University)

Wolfgang Lehner (TU Dresden)

Christof Leng (UC Berkeley)

Stefan Manegold (CWI)

Raghu Nambiar (Cisco)

Manoj K. Nambiar (TCS)

Glenn Paulley (Conestoga Col.)

Keynote Speakers: Umesh Dayal, Alexandru Iosup

Scott Pearson (CLDS Industry Fellow)

Andreas Polze (HPI)

Alexander Reinefeld (HU Berlin)

Berni Schiefer (IBM Labs Toronto)

Saptak Sen (Hortonworks)

Florian Stegmaier (University of Passau)

Till Westmann (Oracle Labs)

Jianfeng Zhan (Chinese Academy of Science)

Platinum Sponsor: Gold Sponsors:

Submission: May 30, 2014 (6pm PDT) Short versions of papers (4-8 LNCS pages)