Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

33
Benchmarking Hive at Yahoo Scale PRESENTED BY Mithun Radhakrishnan ⎪ June 4, 2014 2014 Hadoop Summit, San Jose, California

Transcript of Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

Page 1: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

Benchmark ing H ive a t Yahoo Sca le

P R E S E N T E D B Y M i t h u n R a d h a k r i s h n a n J u n e 4 , 2 0 1 4⎪

2 0 1 4 H a d o o p S u m m i t , S a n J o s e , C a l i f o r n i a

Page 2: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

2

About myself

HCatalog Committer, Hive contributor› Metastore, Notifications, HCatalog APIs› Integration with Oozie, Data Ingestion

Other odds and ends› DistCp

[email protected]

2014 Hadoop Summit, San Jose, California

Page 3: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

3

About this talk

Introduction to “Yahoo Scale” The use-case in Yahoo The Benchmark The Setup The Observations (and, possibly, lessons) Fisticuffs

2014 Hadoop Summit, San Jose, California

Page 4: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

4

The Y!Grid

16 Hadoop Clusters in YGrid› 32500 Nodes› 750K jobs a day

Hadoop 0.23.10.x, 2.4.x Large Datasets

› Daily, hourly, minute-level frequencies› Terabytes of data, 1000s of files, per dataset instance

Pig 0.11 Hive 0.10 / HCatalog 0.5

› => Hive 0.12

2014 Hadoop Summit, San Jose, California

Page 5: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

5

Data Processing Use cases

2014 Hadoop Summit, San Jose, California

Pig for Data Pipelines› Imperative paradigm› ~45% Hadoop Jobs on Production Clusters

• M/R + Oozie = 41%

Hive for Ad hoc queries› SQL› Relatively smaller number of jobs

• *Major* Uptick

Use HCatalog for Inter-op

Page 6: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

6 Yahoo Confidential & Proprietary

Hive is Currently the Fastest Growing Product on the Grid

Mar-13 Apr-13 May-13 Jun-13 Jul-13 Aug-13 Sep-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-140

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

All Jobs Hive (% of all jobs)

All G

rid J

obs

(in M

illio

ns)

Hive

Jobs

(% o

f All

Jobs

)

2.4 million Hive jobs

Page 7: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

7

Business Intelligence Tools

{Tableau, MicroStrategy, Excel, … } Challenges:

› Security• ACLs, Authentication, Encryption over the wire, Full-disk Encryption

› Bandwidth• Transporting results over ODBC

› Query Latency• Query execution time• Cost of query “optimizations”• “Bad” queries

2014 Hadoop Summit, San Jose, California

Page 8: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

8

The Benchmark

TPC-h› Industry standard (tpc.org/tpch)› 22 queries› dbgen –s 1000 –S 3

• Parallelizable

Reynold Xin’s excellent work:› https://github.com/rxin› Transliterated queries to suit Hive 0.9

2014 Hadoop Summit, San Jose, California

Page 9: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

9

Relational Diagram

2014 Hadoop Summit, San Jose, California

PARTKEY

NAME

MFGR

BRAND

TYPE

SIZE

CONTAINER

COMMENT

RETAILPRICE

PARTKEY

SUPPKEY

AVAILQTY

SUPPLYCOST

COMMENT

SUPPKEY

NAME

ADDRESS

NATIONKEY

PHONE

ACCTBAL

COMMENT

ORDERKEY

PARTKEY

SUPPKEY

LINENUMBER

RETURNFLAG

LINESTATUS

SHIPDATE

COMMITDATE

RECEIPTDATE

SHIPINSTRUCT

SHIPMODE

COMMENT

CUSTKEY

ORDERSTATUS

TOTALPRICE

ORDERDATE

ORDER-PRIORITY

SHIP-PRIORITY

CLERK

COMMENT

CUSTKEY

NAME

ADDRESS

PHONE

ACCTBAL

MKTSEGMENT

COMMENT

PART (P_)SF*200,000

PARTSUPP (PS_)SF*800,000

LINEITEM (L_)SF*6,000,000

ORDERS (O_)SF*1,500,000

CUSTOMER (C_)SF*150,000

SUPPLIER (S_)SF*10,000

ORDERKEY

NATIONKEY

EXTENDEDPRICE

DISCOUNT

TAX

QUANTITY

NATIONKEY

NAME

REGIONKEY

NATION (N_)25

COMMENT

REGIONKEY

NAME

COMMENT

REGION (R_)5

Page 10: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

10

The Setup

› 350 Node cluster• Xeon boxen: 2 Slots with E5530s => 16 CPUs• 24GB memory

– NUMA enabled

• 6 SATA drives, 2TB, 7200 RPM Seagates• RHEL 6.4• JRE 1.7 (-d64)• Hadoop 0.23.7+/2.3+, Security turned off• Tez 0.3.x• 128MB HDFS block-size

› Downscale tests: 100 Node cluster• hdfs-balancer.sh

2014 Hadoop Summit, San Jose, California

Page 11: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

11

The Prep

Data generation:› Text data: dbgen on MapReduce› Transcode to RCFile and ORC: Hive on MR

• insert overwrite table orc_table partition( … ) select * from text_table;

› Partitioning:• Only for 1TB, 10TB cases• Perils of dynamic partitioning

› ORC File:• 64MB stripes, ZLIB Compression

2014 Hadoop Summit, San Jose, California

Page 12: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

Observat ions

Page 13: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

13 2014 Hadoop Summit, San Jose, California

Page 14: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

14

100 GB

› 18x speedup over Hive 0.10 (Textfile)• 6-50x

› 11.8x speedup over Hive 0.10 (RCFile)• 5-30x

› Average query time: 28 seconds• Down from 530 (Hive 0.10 Text)

› 85% queries completed in under a minute

2014 Hadoop Summit, San Jose, California

Page 15: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

15 2014 Hadoop Summit, San Jose, California

Page 16: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

16

1 TB

› 6.2x speedup over Hive 0.10 (RCFile)• Between 2.5-17x

› Average query time: 172 seconds• Between 5-947 seconds• Down from 729 seconds (Hive 0.10 RCFile)

› 61% queries completed in under 2 minutes› 81% queries completed in under 4 minutes

2014 Hadoop Summit, San Jose, California

Page 17: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

17 2014 Hadoop Summit, San Jose, California

Page 18: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

18

10 TB

› 6.2x speedup over Hive 0.10 (RCFile)• Between 1.6-10x

› Average query time: 908 seconds (426 seconds excluding outliers)• Down from 2129 seconds with Hive 0.10 RCFile

– (1712 seconds excluding outliers)› 61% queries completed in under 5 minutes› 71% queries completed in under 10 minutes› Q6 still completes in 12 seconds!

2014 Hadoop Summit, San Jose, California

Page 19: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

19

Explaining the speed-ups

Hadoop 2.x, et al. Tez

› (Arbitrary DAG)-based Execution Engine› “Playing the gaps” between M&R

• Temporary data and the HDFS› Feedback loop› Smart scheduling› Container re-use› Pipelined job start-up

Hive › Statistics› “Vector-ized” Execution

ORC› PPD

2014 Hadoop Summit, San Jose, California

Page 20: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

20 2014 Hadoop Summit, San Jose, California

Page 21: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

21 2014 Hadoop Summit, San Jose, California

ORC File Layout Data is composed of multiple streams per

column

Index allows for skipping rows (default to every 10,000 rows), keeping position in each stream, and min-max for each column

Footer contains directory of stream locations, and the encoding for each column

Integer columns are serialized using run-length encoding

String columns are serialized using dictionary for column values, and the same run length encoding

Stripe footer is used to find the requested column’s data streams and adjacent stream reads are merged

Page 22: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

22 2014 Hadoop Summit, San Jose, California

ORC UsageCREATE TABLE addresses ( name string, street string, city string, state string, zip int ) STORED AS orc TBLPROPERTIES ("orc.compress"= "ZLIB");LOCATION ‘/path/to/addresses’;

ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT orc

SET hive.default.fileformat = orcSET hive.exec.orc.memory.pool = 0.50 (ORC writer is allowed 50% of JVM heap size by default)

ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde’INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

Key Default Comments

orc.compress ZLIB high-level compression (one of NONE, ZLIB, Snappy)

orc.compress.size 262,144 (256 KB) number of bytes in each compression chunk

orc.stripe.size 67,108,864 (64 MB) number of bytes in each stripe. Each ORC stripe is processed in one map task (try 32 MB to cut down on disk I/O)

orc.row.index.stride 10,000 number of rows between index entries (must be >= 1,000). A larger stride-size increases the probability of not being able to skip the stride, for a predicate.

orc.create.index true whether to create row indexes. This is for predicate push-down (bloom-filters). If data is frequently accessed/filtered on a certain column, then sorting on the column and using index-filters makes column filters work faster

Page 23: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

23 2014 Hadoop Summit, San Jose, California

Page 24: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

24 2014 Hadoop Summit, San Jose, California

Page 25: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

25

Configuring ORC

set hive.merge.mapredfiles=true set hive.merge.mapfiles=true set orc.stripe.size=67,108,864

› Half the HDFS block-size• Prevent cross-block stripe-read• Tangent: DistCp

set orc.compress=???› Depends on size and distribution› Snappy compression hasn’t been explored

YMMV› Experiment

2014 Hadoop Summit, San Jose, California

Page 26: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

26 2014 Hadoop Summit, San Jose, California

Page 27: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

Conclusions

Page 28: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

28

Y!Grid sticking with Hive

Familiarity› Existing ecosystem

Community Scale Multitenant Coming down the pike

› CBO› In-memory caching solutions atop HDFS

• RAMfs a la Tachyon?

2014 Hadoop Summit, San Jose, California

Page 29: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

29

We’re not done yet

SQL compliance Scaling up the metastore

performance Better BI Tool integration Faster transport

› HiveServer2 result-sets

2014 Hadoop Summit, San Jose, California

Page 30: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

30

References

The YDN blog post:› http

://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn

Code:› https://github.com/mythrocks/hivebench (TPC-h scripts, datagen, transcode utils)› https://github.com/t3rmin4t0r/tpch-gen (Parallel TPC-h gen)› https://github.com/rxin/TPC-H-Hive (TPC-h scripts for Hive)› https://issues.apache.org/jira/browse/HIVE-600 (Yuntao’s initial TPC-h JIRA)

2014 Hadoop Summit, San Jose, California

Page 31: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

Thank You@mithunrk

[email protected]

We are hiring!

Stop by Kiosk P9 or reach out to us at [email protected].

Page 32: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

I ’m glad you asked.

Page 33: Hadoop Summit 2014 : Benchmarking Apache Hive at Yahoo Scale

33

Sharky comments

Testing with Shark 0.7.x and Shark 0.8› Compatible with Hive Metastore 0.9› 100GB datasets : Admirable performance› 1TB/10TB: Tests did not run completely

• Failures, especially in 10TB cases• Hangs while shuffling data• Scaled back to 100 nodes -> More tests ran through, but not completely

› nReducers: Not inferred

Miscellany› Security› Multi-tenancy› Compatibility

2014 Hadoop Summit, San Jose, California