The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.
-
Upload
haylie-fields -
Category
Documents
-
view
214 -
download
0
Transcript of The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.
![Page 1: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/1.jpg)
Demystifying Systems for Interactive and Real-time
Analytics
The BigFrame TeamDuke University, Hong Kong Polytechnic
University, and HP Labs
![Page 2: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/2.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System Landscape
![Page 3: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/3.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System Landscape
Gamma
Aster
Netezza
DB2 PE
Teradata SQL Server Parallel DataWarehouse
Greenplum
![Page 4: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/4.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System Landscape
HP Vertica
ParAccel
Redshift
Vectorwise
![Page 5: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/5.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System LandscapeHadoo
p
Tenzing
Hive
Mahout
HadoopDB
Pig
![Page 6: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/6.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System LandscapeDremel
Drill StingerImpala
SparkDryad SCOPE
![Page 7: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/7.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System Landscape
Cassandra
HBaseBigtable
Druid
HANA
Spanner
Megastore
Splunk
![Page 8: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/8.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System Landscape
StormGraphLab
Streambase
CassovaryGraphX
Solr
ElasticSearch
SciDBCloudera Search
MadLINQ
Pregel
HAMA
![Page 9: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/9.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Analytics System Landscape
Mesos
YARN
Serengeti
Cloud platforms
![Page 10: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/10.jpg)
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
What does this mean for Big Data Practitioners?
![Page 11: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/11.jpg)
Gives them a lot of power!
From: http://animeonly.org/Digital-Wallpapers/Digital-renders/Spiderman-95061p.html
![Page 12: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/12.jpg)
Even the mighty may need a little help
![Page 13: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/13.jpg)
Challenges for Practitioners
Which system touse for the app that I
am developing?
• Features (e.g., graph data)
• Performance (e.g., claims like
System A is 50x faster than B)
• Resource efficiency
• Growth and scalability
• Multi-tenancy
App Developers, Data Scientists
![Page 14: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/14.jpg)
Different parts of my app have different
requirements
Compose “best of breed” systems
ORUse “one size fits
all” system?
Managing manysystems is hard!
System Admins
Challenges for Practitioners
Which system touse for the app that I
am developing?
App Developers, Data Scientists
![Page 15: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/15.jpg)
Managing manysystems is hard!
Different parts of my app have different
requirements
Total Cost of Ownership (TCO)?
CIOSystem Admins
Challenges for Practitioners
Which system touse for the app that I
am developing?
App Developers, Data Scientists
![Page 16: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/16.jpg)
Numbers make decisions easier
![Page 17: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/17.jpg)
Need benchmarks
![Page 18: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/18.jpg)
One Approach
Develop a benchmark per system category
Categorize systems
![Page 19: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/19.jpg)
Useful, But …
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenant
Star Schema BenchmarkTPC-H / TPC-DS
Counting triangles
Terasort
GridMixSWIM
HiBench
DFSIO
MapReduce Vs. Parallel DB /Hive Benchmark (in HiBench) /Berkeley Big Data Benchmark
Yahoo Cloud Serving Benchmark (YCSB)YCSB Variants
CH-benchCHmark
MulTe
Graph 500PageRank
RDF Benchmarks
Information Extraction Benchmark
Linear Road
SS-DB
![Page 20: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/20.jpg)
Problem #1 May Miss the Big Picture
![Page 21: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/21.jpg)
Problem #1 May Miss the Big Picture
Cannot capture the complexities and end-to-end behavior of big data applications and deployments:
(i) Bottlenecks(ii) Data conversion, transfer, & loading overheads(iii) Storage costs & other parts of the data life-cycle(iv) Resource management challenges(v) Total Cost of Ownership (TCO)
![Page 22: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/22.jpg)
Give a man a fish and you will feed him for a day.
Give him fishing gear and you will feed him for life.
-- Anonymous
Problem #2 Benchmark
BenchmarkGenerator
![Page 23: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/23.jpg)
BigFrame: A Benchmark Generator for Big
Data Analytics
![Page 24: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/24.jpg)
How a user uses BigFrame
BigFrame
Interface
bigif(benchmark
input format)BenchmarkGenerator
bspec(benchmark specification)
HBase
Hive
MapReduce
Benchmark Driver for System
Under Testrun the benchmark
results
System Under Test
![Page 25: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/25.jpg)
bspec: Benchmark Specification
HBase
Hive
MapReduce
System Under Test
2. Data refreshpattern
Time
3. Query streams
4. E
valu
atio
n m
etri
cs
1. Data forinitial load
![Page 26: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/26.jpg)
What does the user(want to) specify?
BigFrame
Interface
bigif(benchmark
input format)
![Page 27: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/27.jpg)
The 3Vs
MPP DB
Columnar
MapReduce
Mixed
Dataflow
Streaming
Text Analytics
Array DB
GraphMulti-tenantVolume
VarietyVelocity
![Page 28: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/28.jpg)
bigif: BigFrame’s InputFormat
Data Variety
Relational, text, array,
graph
Small,medium,
large
Data Volume
QueryVolume
Queryconcurrency
& classes
DataVelocity
At rest,slow,fast
Micro,Macro
QueryVariety
Exploratory,Continuous
QueryVelocity
![Page 29: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/29.jpg)
Benchmark Generationbigif
(benchmark input format)
BenchmarkGenerator
bspec(benchmark specification)
bigif describes pointsin a discrete space of
{Data,Query} X{Variety,Volume,Velocity}
1. Initial data to load 2. Data refresh pattern3. Query streams4. Evaluation metrics
Benchmark generation can beaddressed as a search problem
within a rich application domain
![Page 30: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/30.jpg)
Application Domain Modeled Currently
E-commerce sales,
promotions, recommendati
ons
Social mediasentiment &
influence
Benchmark generation can beaddressed as a search problem
within a rich application domain
![Page 31: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/31.jpg)
Application Domain Modeled Currently
Item
Customer
Web_sales
Promotion
Tweets
Relationships
![Page 32: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/32.jpg)
Application Domain Modeled Currently
Item
Web_sales
Promotion
![Page 33: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/33.jpg)
Application Domain Modeled Currently
![Page 34: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/34.jpg)
Benchmark Generationbigif
(benchmark input format)
BenchmarkGenerator
bspec(benchmark specification)
bigif describes pointsin a discrete space of
{Data,Query} X{Variety,Volume,Velocity}
1. Initial data to load 2. Data refresh pattern3. Query streams4. Evaluation metrics
BigFrame can generate Data, Queries, and Arrival Patterns with the user-specified {Variety,Volume,Velocity}
requirements from the application domain
![Page 35: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/35.jpg)
Use Cases of BigFrame
![Page 36: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/36.jpg)
Use Case I: Exploratory BI
• Large volumes of relational data
• Mostly aggregation and few joins
• Can Spark’s performance match that of an MPP DB?
Data Variety = {Relational}
Query Variety = Micro
BigFrame will generate a benchmark specification containing
relational data and (SQL-ish) queries
![Page 37: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/37.jpg)
Use Case II: Complex BI
• Large volumes of relational data
• Even larger volumes of text data
• Combined analytics
Data Variety = {Relational, Text}
Query Variety = Macro (application-focused instead of
micro-benchmarking)
BigFrame will generate a benchmark specification that includes
sentiment analysis tasks over tweets
![Page 38: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/38.jpg)
• Large volume and velocity of
relational and text data
Use Case III: Dashboards
• Continuously-updated Dashboards
Query Velocity = Continuous
(as opposed to Exploratory)
Data Velocity =Fast
BigFrame will generate a benchmark specification that includes data refresh as well as continuous queries whose results
change upon data refresh
![Page 39: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/39.jpg)
Use Case IV: Does One Size Fit All?
• Growing set of applications have to
process relational, text, & graph data
• Compose “best of breed”
systems or use a “one size
fits all” system?
Data Variety = {Relational, Text,
Graph}
BigFrame will generate a benchmark specification that includes composite workflows
with relational, text, and graph analytics
Query Variety = Macro
![Page 40: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/40.jpg)
Use Case V: Multi-tenancy and SLAs
• Big data deployments are
increasingly multi-tenant and
need to meet SLAs
Specifiedthrough Query
Volume dimension
BigFrame can generate a benchmark specification containing a specified number of concurrent query streams with class labels for queries (e.g., Batch, Interactive, or Streaming)
![Page 41: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/41.jpg)
Working with the Community
• First release of BigFrame planned for August 2013
• With feedback from benchmark developers (BigBench)
• Open-source with extensibility APIs
• Benchmark Drivers for more systems
• Utilities (accessed through the Benchmark Driver to
drill down into system behavior during benchmarking)
• Instantiate the BigFrame pipeline for more app domains
![Page 42: The BigFrame Team Duke University, Hong Kong Polytechnic University, and HP Labs.](https://reader036.fdocuments.us/reader036/viewer/2022081518/5518a52c550346991f8b4a32/html5/thumbnails/42.jpg)
Take Away• “Benchmarks shape a field (for better or worse) …”
-- David Patterson, Univ. of California, Berkeley
• Benchmarks meet different needs for different people
• End customers, application developers, system designers,
system administrators, researchers, CIOs
• BigFrame helps users generate benchmarks that best
meet their needs