© Hortonworks Inc. 2013. MR / Tez Query Comparison Page 1 HW = 20 Node (48 GB RAM, 6x disk) SW =...

8
© Hortonworks Inc. 2013. © Hortonworks Inc. 2013. MR / Tez Query Comparison Page 1 HW = 20 Node (48 GB RAM, 6x disk) SW = Hive Trunk (Nov 13 2013) + ORCFile + Vectorization Hive Trunk M/R Hive Trunk Tez (Cold) Hive Trunk Tez (Hot) Tez Relative Gain Hot / Cold Gain (%) query12 297.5 20.0 9.7 2958.8% 105.2% query15 75.9 62.6 58.7 29.4% 6.7% query21 39.3 52.4 46.9 -16.2% 11.7% query26 46.9 33.0 23.3 101.0% 41.3% query27 37.6 17.7 8.4 348.8% 111.8% query28 39.9 24.2 12.5 218.4% 93.4% query3 58.9 23.9 16.1 265.9% 48.7% query34 87.7 31.7 25.3 246.6% 25.3% query39 234.5 62.8 55.5 322.1% 13.1% query43 57.0 34.4 26.0 119.2% 32.4% query46 103.3 46.1 30.9 234.4% 49.2% query52 59.9 23.5 15.6 285.3% 51.3% query55 59.3 27.7 18.3 224.1% 51.4% query67 820.9 821.1 787.2 4.3% 4.3% query68 102.7 53.1 42.2 143.2% 25.9% query7 47.9 27.7 18.9 153.1% 46.2% query73 87.7 29.7 22.7 287.1% 31.0% query88 483.3 95.0 90.2 435.6% 5.3% query90 122.3 41.1 26.9 355.3% 53.0% query92 278.6 142.5 135.7 105.4% 5.1% query96 42.8 24.4 16.4 160.6% 48.6% query97 279.0 147.5 133.4 109.1% 10.6% query98 1451.5 50.0 38.6 3662.2% 29.5%

Transcript of © Hortonworks Inc. 2013. MR / Tez Query Comparison Page 1 HW = 20 Node (48 GB RAM, 6x disk) SW =...

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

MR / Tez Query Comparison

Page 1

HW = 20 Node (48 GB RAM, 6x disk)

SW = Hive Trunk (Nov 13 2013) + ORCFile + Vectorization

Hive Trunk M/R Hive Trunk Tez (Cold) Hive Trunk Tez (Hot) Tez Relative Gain Hot / Cold Gain (%)query12 297.5 20.0 9.7 2958.8% 105.2%query15 75.9 62.6 58.7 29.4% 6.7%query21 39.3 52.4 46.9 -16.2% 11.7%query26 46.9 33.0 23.3 101.0% 41.3%query27 37.6 17.7 8.4 348.8% 111.8%query28 39.9 24.2 12.5 218.4% 93.4%query3 58.9 23.9 16.1 265.9% 48.7%query34 87.7 31.7 25.3 246.6% 25.3%query39 234.5 62.8 55.5 322.1% 13.1%query43 57.0 34.4 26.0 119.2% 32.4%query46 103.3 46.1 30.9 234.4% 49.2%query52 59.9 23.5 15.6 285.3% 51.3%query55 59.3 27.7 18.3 224.1% 51.4%query67 820.9 821.1 787.2 4.3% 4.3%query68 102.7 53.1 42.2 143.2% 25.9%query7 47.9 27.7 18.9 153.1% 46.2%query73 87.7 29.7 22.7 287.1% 31.0%query88 483.3 95.0 90.2 435.6% 5.3%query90 122.3 41.1 26.9 355.3% 53.0%query92 278.6 142.5 135.7 105.4% 5.1%query96 42.8 24.4 16.4 160.6% 48.6%query97 279.0 147.5 133.4 109.1% 10.6%query98 1451.5 50.0 38.6 3662.2% 29.5%

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Query 88

Page 2

select *from (select count(*) h8_30_to_9 from store_sales JOIN household_demographics ON store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk JOIN time_dim ON store_sales.ss_sold_time_sk = time_dim.t_time_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk where time_dim.t_hour = 8 and time_dim.t_minute >= 30 and ((household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or (household_demographics.hd_dep_count = 1 and household_demographics.hd_vehicle_count<=1+2)) and store.s_store_name = 'ese') s1 JOIN (select count(*) h9_to_9_30 from store_sales ...

• 8 full table scans

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Query 88: M/R

Page 3

Total MapReduce jobs = 29...Total MapReduce CPU Time Spent: 0 days 2 hours 52 minutes 39 seconds 380 msecOK345617 687625 686131 1032842 1030364 606859604232 692428Time taken: 403.28 seconds, Fetched: 1 row(s)

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Query 88: Tez

Page 4

Map 1: 1/1 Map 11: 1/1 Map 12: 1/1 Map 13: 1/1 Map 14: 1/1 Map 15: 1/1 Map 16: 241/241 Map 18: 1/1 Map 19: 1/1 Map 2: 1/1 Map 20: 1/1 Map 21: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 241/241 Map 26: 1/1 Map 27: 1/1 Map 28: 1/1 Map 29: 1/1 Map 3: 241/241 Map 30: 240/240 Map 32: 241/241 Map 34: 1/1 Map 35: 1/1 Map 36: 1/1 Map 37: 1/1 Map 38: 1/1 Map 39: 241/241 Map 42: 1/1 Map 43: 1/1 Map 44: 240/240 Map 46: 241/241 Reducer 10: 1/1 Reducer 17: 1/1 Reducer 25: 1/1 Reducer 31: 1/1 Reducer 33: 1/1 Reducer 4: 1/1 Reducer 40: 1/1 Reducer 41: 1/1 Reducer 45: 1/1 Reducer 47: 1/1 Reducer 5: 1/1 Reducer 6: 1/1 Reducer 7: 1/1 Reducer 8: 1/1 Reducer 9: 1/1Status: Finished successfullyOK345617 687625 686131 1032842 1030364 606859 604232 692428Time taken: 90.233 seconds, Fetched: 1 row(s)

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Status

• Broadcast Join– Regular tasks to filter/prep the side to broadcast– Hashtables assembled in the join task– Can run in any vertex (not just map)

• TezSessions (AM, FS, UGI, MetaStore)– Start with cli/hs2 session– Brings up AM, connects to metastore, etc– Setup only once per session

• Container reuse– Task launch is now cheap– Multiple waves and re-use within session– Stragglers

• Multiple inputs/outputs/ TezProcessor– Can handle multiple scatter/gather + broadcast + 1-1 edges– Can handle multiple outputs for multi-table insert case– No need for single task with multiple operator pipelines

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Status

• Localization– Works with hive-exec + UDFs– If desired: Avoids re-localization of hive-exec

• Split Gen in AM/TezGroupedSplits/Caching– Splits generated according to headroom– Caching of NN connections

• Statistics (not Tez specific)– Allows to compute num of tasks– Used for join conversion– Degrades with available stats

• MetaStore improvements (not Tez specific)– Partition pruning is MUCH faster now

• TezMiniMR– .q file tests for Tez

• Explain plan

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Current limitations

• Not in phase I–RC Merge task/ analyze uses MR on Tez–UNION ALL not yet supported–SMB join not yet supported

• In phase I–More testing + bug fixes!– Integrate with new annotated –Re-localization (Tez)–Tez release

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Try Tez For Yourself

• 1: Download Hortonworks Sandbox 2.0 : hortonworks.com/sandbox• 2: Log in: root/hadoop• 3: git clone https://github.com/t3rmin4t0r/tez-autobuild/• 4: cd tez-autobuild ; make dist install• 5: /opt/hive/bin/hive• 6: set hive.optimize.tez=true/false