Post on 23-Feb-2016
description
1
Sqoop 2 IntroductionMengwei Ding, Software Engineer Intern at Cloudera
2
What is Sqoop
• Apache Top-Level Project• SQl and hadOOP• Transfer a large bulk of data
• From relational data warehouses: Teradata, MySQL, PostgreSQL, Oracle, Netezza
• To Hadoop ecosystem: HDFS, Hive, HBase, Avio• Vice versa
• Sqoop 1(1.4.3) and Sqoop 2(1.99.2)
3
Sqoop 1
4
Sqoop 1 Challenges
• Command line tool, configured with line arguments(60+!)• Connector-driven:
o Responsible for metadata lookups and data transfero JDBC vocabulary-enforced (--connect)o Implicit connector selection
• Non-uniform, duplicated functionality• Client accesses hadoop configurations and databases
directly• Security Concerns:
o Client needs to know credentials to databases
• Type mapping is not clearly defined
5
Sqoop 2 - Design Goals
• Same goal: transfer data around• Ease of Use
o Sqoop as a Serviceo Domain Specific Interactions without too many args
• Ease of Extensiono No low-level Hadoop knowledge neededo Uniform functionality of connectors, no functional
overlap between connectors
• Security and Separation of Concernso Role based access and use
6
Sqoop 2 - Design Goals
7
Sqoop 2 - Connection vs Job Metadata
• There are two distinct sets of optionso Connection (distinct per database)o Job (distinct per table)
8
Sqoop 2 - Connection vs Job Metadata
• Another distinct two sets of argumentso Connector specifico Shared across all connectors
9
Sqoop 2 - Security
• Support for secure access to external system via role-based access to connection objectso Administrators create/edit/delete connectionso Operators use connections
• Connection encompass credentialso Connection created once, then reused latero Created by Admin, used by operator to safeguard
credential access from end user
10
Sqoop 2 - Resource Management
• Connections allow specification of resource policyo Administrator can limit the total number of physical
connections open at one timeo Connections can be disabled
11
Sqoop 2 - Current Status
• Primary focus of Sqoop community• Second cut: 1.99.2
o bits and docs: http://sqoop.apache.org
12
Demo Time
13
Thank You!