Post on 06-May-2015
@alepoletto
@alepoletto
Hive
@alepoletto
Hive – What is?
• Data warehouse System Layer build on top of Hadoop
• Define Structure for your Unstructured Big Data
• Query this Data Using SQL like Language HiveQL
@alepoletto
Hive - is not …Relational Database
• Use Relational database to store metadata.
• Data that HIVE process is stored in HDFS
@alepoletto
Hive - is not… designed for online transactions• Runs on Hadoop ( batch Processing system)
• Jobs can have High latency with overhead
@alepoletto
Hive - is not… real time queries and row updates• Suited for batch jobs and over large sets of immutable data
@alepoletto
Hive – What it does
• Hadoop was built to organize and store massive amounts of data.
• A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats.
• Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight.
@alepoletto
Hive – Architecture
@alepoletto
Hive – Tables
• Hive Tables• Data: in files in HDFS• Schema: in metadata stored into relational tables
• Schema and Data are separated
• Hive needs schema for existing HDFS data
@alepoletto
@alepoletto
Hive – Pig x Hive
Pig is good for• ETL.
• Preparing data for easier analyses.
• for long series of steps to perform
Hive is for• Query Data
• Need answer to specific questions
• If you are familiar with sql
@alepoletto
Hive – HiveQL
@alepoletto
@alepoletto
HCatalog – What it does
• Metadata and Table management System for Hadoop.
• shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce• Interoperability across data processing tools
• Table abstraction, so you don’t need to worry with where and how the data is stored.
@alepoletto
HCatalog – Summary
• “Takes Hive Meatafdata and opens to everybody else”
@alepoletto
HCatalog – Overview
• Access data Through Hcatalog
@alepoletto
HCatalog – Archtecture
@alepoletto