Sqoop 2 Introduction

13
1 Sqoop 2 Introduction Mengwei Ding, Software Engineer Intern at Cloudera

description

Sqoop 2 Introduction. Mengwei Ding, Software Engineer Intern at Cloudera. What is Sqoop. Apache Top-Level Project SQl and hadOOP Transfer a large bulk of data From relational data warehouses: Teradata, MySQL, PostgreSQL , Oracle, Netezza - PowerPoint PPT Presentation

Transcript of Sqoop 2 Introduction

Page 1: Sqoop  2 Introduction

1

Sqoop 2 IntroductionMengwei Ding, Software Engineer Intern at Cloudera

Page 2: Sqoop  2 Introduction

2

What is Sqoop

• Apache Top-Level Project• SQl and hadOOP• Transfer a large bulk of data

• From relational data warehouses: Teradata, MySQL, PostgreSQL, Oracle, Netezza

• To Hadoop ecosystem: HDFS, Hive, HBase, Avio• Vice versa

• Sqoop 1(1.4.3) and Sqoop 2(1.99.2)

Page 3: Sqoop  2 Introduction

3

Sqoop 1

Page 4: Sqoop  2 Introduction

4

Sqoop 1 Challenges

• Command line tool, configured with line arguments(60+!)• Connector-driven:

o Responsible for metadata lookups and data transfero JDBC vocabulary-enforced (--connect)o Implicit connector selection

• Non-uniform, duplicated functionality• Client accesses hadoop configurations and databases

directly• Security Concerns:

o Client needs to know credentials to databases

• Type mapping is not clearly defined

Page 5: Sqoop  2 Introduction

5

Sqoop 2 - Design Goals

• Same goal: transfer data around• Ease of Use

o Sqoop as a Serviceo Domain Specific Interactions without too many args

• Ease of Extensiono No low-level Hadoop knowledge neededo Uniform functionality of connectors, no functional

overlap between connectors

• Security and Separation of Concernso Role based access and use

Page 6: Sqoop  2 Introduction

6

Sqoop 2 - Design Goals

Page 7: Sqoop  2 Introduction

7

Sqoop 2 - Connection vs Job Metadata

• There are two distinct sets of optionso Connection (distinct per database)o Job (distinct per table)

Page 8: Sqoop  2 Introduction

8

Sqoop 2 - Connection vs Job Metadata

• Another distinct two sets of argumentso Connector specifico Shared across all connectors

Page 9: Sqoop  2 Introduction

9

Sqoop 2 - Security

• Support for secure access to external system via role-based access to connection objectso Administrators create/edit/delete connectionso Operators use connections

• Connection encompass credentialso Connection created once, then reused latero Created by Admin, used by operator to safeguard

credential access from end user

Page 10: Sqoop  2 Introduction

10

Sqoop 2 - Resource Management

• Connections allow specification of resource policyo Administrator can limit the total number of physical

connections open at one timeo Connections can be disabled

Page 11: Sqoop  2 Introduction

11

Sqoop 2 - Current Status

• Primary focus of Sqoop community• Second cut: 1.99.2

o bits and docs: http://sqoop.apache.org

Page 12: Sqoop  2 Introduction

12

Demo Time

Page 13: Sqoop  2 Introduction

13

Thank You!