Page 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive: Data Organization for...

download Page 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive: Data Organization for Performance Gopal Vijayaraghavan.

If you can't read please download the document

description

Page 3 © Hortonworks Inc – All Rights Reserved Good idea: Do things that scale! There are many problems like this, but this one is mine

Transcript of Page 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive: Data Organization for...

Page 1 Hortonworks Inc All Rights Reserved Hive: Data Organization for Performance Gopal Vijayaraghavan Page 2 Hortonworks Inc All Rights Reserved In this episode All BigData problems are primarily lookup problems All Lookup problems are really Storage problems All Storage problems turn into ETL problems ETL problems are all about the Data Data navigation? Data organization? Data ingestion? Its Big? Page 3 Hortonworks Inc All Rights Reserved Good idea: Do things that scale! There are many problems like this, but this one is mine Page 4 Hortonworks Inc All Rights Reserved Partitions If you have a database on cars and you partition on VIN# If you have a database on sales and you partition on customer_id Rule of thumb: Average partition is >=1Gb and total # of partitions per query BigInt or Float -> Double) Page 13 Hortonworks Inc All Rights Reserved Questions? ?