Zhen Jia (贾禛prof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Naive Bayes SVM Grep...
Transcript of Zhen Jia (贾禛prof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Naive Bayes SVM Grep...
INS
TITUTE O
F CO
MP
UTIN
G TEC
HN
OLO
GY
DCBench: a Data Center Benchmark Suite
Zhen Jia (贾 禛)http://prof.ict.ac.cn/zhenjia/
Institute of Computing Technology, Chinese Academy of Sciences
2nd BPOE workshopin conjunction with CCF HPC China 2013
October 31,2013,Guilin
HPC China 20132nd BPOE
Workload SpectrumCPU intensive Memory intensive
I/O intensiveFigure from Intel
HPC China 20132nd BPOE
Workload Spectrum
Data Centers
HPC China 20132nd BPOE
Why Benchmarking ?
• Sometimes there is a solution.
HPC China 20132nd BPOE
Why Benchmarking ?
• What about the solution when …
HPC China 20132nd BPOE
Benchmark’s Role in Computer Science
“Benchmarking is the quantitative foundation of computer system and architecture research, are used to experimentally determine the benefits of new designs.”
‐‐ C. Bienia, S. Kumar, J. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. PACT 2008
HPC China 20132nd BPOE
State‐of‐Practice Benchmark Suites
SPEC CPU SPEC Web HPCC PARSEC
TPCCYCSBGridmix
HPC China 20132nd BPOE
Data Centers [1]
• Distinguishing features:Massive scaleMixed workloads
• Workload classification:Online services (service)
E.g. Web search
Offline processing (data analysis)E.g. MapReduce programs
[1] Barroso et al , “The Datacenter as a Computer”, 2009
HPC China 20132nd BPOE
Previous Work
CloudSuite (Ferdman et al., “Clearing the Clouds”, ASPLOS 2012)– Six scale‐out workloads:
Web search
Web serving
Media streaming
Data serving
Data analytic(Bayes)
Software testing
Service workloads
Data Analysis Workload
They incline to service workloads !
HPC China 20132nd BPOE
Scale‐out Performance of Data Analysis Workloads
• Speed Up CloudSuite data analyticBayes
Data analysis workloads are diversified!
HPC China 20132nd BPOE
Content
• Background and Motivation
• DCBench
• Workloads Characterization
HPC China 20132nd BPOE
DCBench
Workloads
VM Operation
Scale‐outService
Data Analysis
Web site: http://prof.ict.ac.cn/DCBench/
DCBench
Release on July 2013
HPC China 20132nd BPOE
Methodology of Workloads Choosing
• Step 1: Rank main websites and web services according to page views and daily visitors
• Step 2: Decompose the service programs into algorithms and basic operations
• Step 3: Select algorithms and basic operations according to their popularity
HPC China 20132nd BPOE
Step 1: Ranking
HPC China 20132nd BPOE
Top Sites on the Web
More details in http://www.alexa.com/topsites/global;0
40%
25%
15%
5% 15%
Search Engine Social NetworkElectronic Commerce Media StreamingOthers
Top Sites on the Web
HPC China 20132nd BPOE
Step 2: Decomposing
HPC China 20132nd BPOE
Algorithms in Search Engine
graph mining
grep & segmentation
pagerankword count
sort
vector calculationFigure from “The Anatomy of a Large-Scale Hypertextual Web Search Engine”
HPC China 20132nd BPOE
Algorithms in Recommendation Sub‐systems
HPC China 20132nd BPOE
40%
25%
15%
5% 15%
Search Engine Social NetworkElectronic Commerce Media StreamingOthers
Summary of Anatomy of Common Services
Algorithms used in Search:PagerankSegmentationFeature ReductionGrepStatistical countingsortRecommendation……
Top Sites on The Web
Algorithms used in Social Network:RecommendationClustering ClassificationGrepFeature ReductionStatistical countingSort……
Algorithms used in electronic commerce:RecommendationAssociate rule miningWarehouse operationClustering ClassificationStatistical counting……
HPC China 20132nd BPOE
Step 3: Selecting
HPC China 20132nd BPOE
40%
25%
15%
5% 15%
Search Engine Social NetworkElectronic Commerce Media StreamingOthers
Top Operations and Algorithms
Top Sites on The Web
Grep Pagerank
Recommendation
HPC China 20132nd BPOE
Main Algorithms in Data Centers
Data centeralgorithms
Basic operation
Association rule mining
Classification
Cluster
Recommendation
Warehouse operation
Feature reduction
Graph mining
Vector calculate
Segmentation
HPC China 20132nd BPOE
Overview of DCBenchCategory Workloads Programming
modellanguage source
Basic operation Sort MapReduce Java Hadoop
Wordcount MapReduce Java Hadoop
Grep MapReduce Java Hadoop
Classification Naïve Bayes MapReduce Java Mahout
Support Vector Machine MapReduce Java Implementedby ourself
Cluster K‐means MapReduce Java Mahout
MPI C++ IBM PML
Fuzzy k‐means MapReduce Java Mahout
MPI C++ IBM PML
Recommendation Item basedCollaborative Filtering
MapReduce Java Mahout
Association rule mining
Frequent patterngrowth
MapReduce Java Mahout
Segmentation Hidden Markov model MapReduce Java Implementedby ourself
HPC China 20132nd BPOE
Category Workloads Programming model
language source
Warehouse operation
Database operations MapReduce Java Hive‐bench
Feature reduction
Principal Component Analysis
MPI C++ IBM PML
Kernel Principal Component Analysis
MPI C++ IBM PML
Vector calculate Paper similarity analysis
All‐Pairs C&C++ Implemented by ourself
Graph mining Breadth‐first search MPI C++ Graph500
Pagerank MapReduce Java MahoutService Search engine C/S Java Implemented by ourself
Auction C/S Java Rubis
Service Media streaming C/S Java Cloudsuite
Overview of DCBench (Cont’)
HPC China 20132nd BPOE
Content
• Background and Motivation
• DCBench
• Workloads Characterization [2]
[2] Zhen Jia et al, “Characterizing Data Analysis Workloads in Data Centers”IISWC 2013 Best Paper
HPC China 20132nd BPOE
Compared Benchmarks
Filed : Scale out workloads HPC CPU Web
Workloads :
CloudSuite v1 HPCC SPEC CPU 2006 SPEC Web 2005
Web search HPL SPEC INT TPC‐W
Data serving Streaming SPEC FP
Web serving Ptrans PARSEC
Media streaming RandomAccess
Software testing DGEMM
FFT
Comm
• Scale-out service workloads share many similarity characteristics with that of traditional service workloads.
• So we just use the service workloads to describe them
HPC China 20132nd BPOE
Breakdown of Executed Instructions
• Analysis workloads have more application level instructions• The service workloads have higher percentages of kernel level
instructions
Data analysisservice
0%10%20%30%40%50%60%70%80%90%
100%
Naive Bayes
SVM
Grep
WordC
ount
K‐means
Fuzzy K‐means
PageRa
nkSort
Hive‐ben
chIBCF
HMM avg
Software Testing
Med
ia Streaming
Data Serving
Web
Search
Web
Serving
SPEC
Web
TPC‐W
SPEC
FPSPEC
INT
PARSEC
HPCC
‐DGEM
MHP
CC‐FFT
HPCC
‐HPL
HPCC
‐PTR
ANS
HPCC
‐Rando
mAccess
HPCC
‐STR
EAM
kernel application
HPC China 20132nd BPOE
Architecture Block Diagram
Figure from Intel
HPC China 20132nd BPOE
Pipeline Stalls• The service workloads have more RAT (Register Allocation Table) stalls • The data analysis workloads have more RS (Reservation Station) and
ROB (ReOrder Buffer) full stalls• Front end stalls !
Data analysis
Service
HPC China 20132nd BPOE
Main reason of pipeline stall: memory‐wall
Figure from :The Architecture of the Nehalem Processor And Nehalem-EP SMP Platforms
HPC China 20132nd BPOE
Reasons of Front End Stalls• High Icache misses and ITLB misses cause front end stall
Data analysis service
0
20
40
60
80
100
L1 IC
ache
Miss p
er K‐In
struction
HPC China 20132nd BPOE
0
20
40
60
80
100
L2 Cache
misses pe
r k‐In
struction
L2 Cache Behaviors
• Data analysis workloads have good L2 cache behaviors
Data analysis
service
HPC China 20132nd BPOE
LLC behaviors
• Data Center workloads – Have good LLC behaviors– Better than most of the HPC workloads
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Percen
tage of L2 misses s
atisfie
d by
L3
HPC China 20132nd BPOE
Branch Prediction• Data analysis workloads have pretty good branch behaviors
• Branches of Services workloads are hard to predict
34
Data analysis service
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
Bran
ch m
ispred
ictio
n ratio
HPC China 20132nd BPOE
Some Observations• Analysis workloads are different from scale‐out service
workloads and traditional workloads• For data analysis workloads, more app level instructions are
executed• High Icache and ITLB misses
– Impact: High percentage of front end stall – Cause: Massive scale of software infrastructure, high level languages, third
party lib– Rethink the design of Icache or ITLB or simplify SW stack
• Low level caches are good for data analysis workloads– Pay more attention to area and energy of caches
• The branch predictor is quite effective
HPC China 20132nd BPOE
More information: http://prof.ict.ac.cn/DCBench/
HPC China 20132nd BPOE
Back up
HPC China 20132nd BPOE
Data Center v.s. Big Data
Big Data Analytic
Scale‐outService
VM Operation
DataIntensive
HPC
Data center Big Data
HPC China 20132nd BPOE
Each Algorithm’s Application ScenariosAlgorithm Application Scenarios
SortRanking the pages according to its importance (PageRank)Pages sorting by its ID (Web storage in database)
WordcountCalculating the TF‐IDF base information,such as term frequencyObtain the user operations count to analysis their social behavior (in Wolfram Alpha)
GrepLog analysisWeb information extractionFuzzy search
Naïve BayesSpam recognition(Spam Filtering with Naive Bayes)Bioinformatics(Naïve Bayesian Classifier for Rapid Assignment of RNA Sequences into the New Bacterial Taxonomy)
Support Vector MachineClassification ( Question Classification)Image Processing (Image annotation)Text Categorization
HPC China 20132nd BPOE
Each Algorithm’s Application Scenarios (Cont’)K‐means
Image processing (Fast image segmentation)High‐resolution landform classification
Item‐based Collaborative Filtering Amazon recommender system
Hidden Markov modelBioinformatics (Protein homology detection)Speech recognition , Handwriting recognition Word Segmentation
Frequent pattern growth
Market AnalysisData mining in Business (identifying competitive suppliers in Supply Chain Management)Intrusion detectionQuery Recommendation
Warehouse operationTaobao Yunti system FacebookYahoo!
Principal Component Analysiscomputer visionpattern recognitionFace Representation and Recognition