Sentry - An Introduction

17
Sentry: Open Source Authorization for Hive & Impala Alexander Alten-Lorenz | Senior Field Engineer, Cloudera Wednesday, 7 th November 2013

description

Talk I held in Stuttgart about Sentry, with a Live Demo.

Transcript of Sentry - An Introduction

Page 1: Sentry - An Introduction

Sentry: Open Source Authorization for Hive & ImpalaAlexander Alten-Lorenz | Senior Field Engineer, Cloudera Wednesday, 7th November 2013

Page 2: Sentry - An Introduction

Defining  Security  Func/ons

!2

Perimeter  Guarding  access  to  the  

cluster  itself  !!!

Technical  Concepts:  Authen3ca3on  

Network  isola3on

Data  Protec3ng  data  in  the  

cluster  from  unauthorized  visibility  

!!

Technical  Concepts:  Encryp3on  

Data  masking

Access  Defining  what  users  and  applica3ons  can  do  with  

data  !!

Technical  Concepts:  Permissions  Authoriza3on

Visibility  Repor3ng  on  where  data  came  from  and  how  it’s  

being  used  !!

Technical  Concepts:  Audi3ng  Lineage

Page 3: Sentry - An Introduction

Enabling  Enterprise  Security

!3

Perimeter  Guarding  access  to  the  

cluster  itself  !!!

Technical  Concepts:  Authen3ca3on  

Network  isola3on

Data  Protec3ng  data  in  the  

cluster  from  unauthorized  visibility  

!!

Technical  Concepts:  Encryp3on  

Data  masking

Access  Defining  what  users  and  applica3ons  can  do  with  

data  !!

Technical  Concepts:  Permissions  Authoriza3on

Visibility  Repor3ng  on  where  data  came  from  and  how  it’s  

being  used  !!

Technical  Concepts:  Audi3ng  Lineage

Sentry  Kerberos  |  Oozie  |  Knox Cloudera  NavigatorCer3fied  Partners

Available  7/23

Page 4: Sentry - An Introduction

Hive  Overview

!4

SQL  Access  to  Hadoop  § MapReduce:  great  massively  scalable  batch  processing  framework;  required  development  for  each  new  job  

§ Hive  opened  up  Hadoop  for  more  users  with  standard  SQL  !

Key  Challenges  § Batch  MapReduce  too  slow  for  interac3ve  BI/analy3cs  § No  concurrency,  no  security  !

OpEons  Today  § Impala  designed  for  low-­‐latency  queries  § HiveServer2  delivers  concurrency,  authen3ca3on  

Page 5: Sentry - An Introduction

Our  OpenSource  ac/vity

!5

CDH  4.1  (HiveServer2)  § Concurrency  and  Kerberos  authen3ca3on  for  Hive  § JDBC  and  Beeline  clients  

CDH  4.2  

§ HDFS  impersona3on  authoriza3on  as  stop-­‐gap  § Pluggable  authen3ca3on  API  § JDBC  LDAP  username/password  

ODBC  

§ Supports  Kerberos  authen3ca3on  and  LDAP  § Extended  partner  cer3fica3on

Page 6: Sentry - An Introduction

Current  State  of  Authoriza/on

!6

Insecure  Advisory  Authoriza3on  Users  can  grant  themselves  permissions  Intended  to  prevent  accidental  dele3on  of  data  Problem:  Doesn’t  guard  against  malicious  users  

HDFS  Impersona3on  Data  is  protected  at  the  file  level  by  HDFS  permissions  Problem:  File-­‐level  not  granular  enough  Problem:  Not  role-­‐based

Two  Sub-­‐OpEmal  Choices  for  SQL  on  Hadoop

Page 7: Sentry - An Introduction

Authoriza/on  Requirements

!7

Secure  Authoriza3on  Ability  to  control  access  to  data  and/or  privileges  on  data  for  authen3cated  users  

Fine-­‐Grained  Authoriza3on  Ability  to  give  users  access  to  a  subset  of  data  (e.g.  column)  in  a  database  

Role-­‐Based  Authoriza3on  Ability  to  create/apply  templa3zed  privileges  based  on  func3onal  roles  

Mul3-­‐Tenant  Administra3on  Ability  for  central  admin  group  to  empower  lower-­‐level  admins  to  manage  security  for  each  database/schema

Page 8: Sentry - An Introduction

The  Next  Step:  Introducing  Sentry

!8

Unlocks  Key  RBAC  Requirements  Secure,  fine-­‐grained,  role-­‐based  authoriza3on  Mul3-­‐tenant  administra3on  

Open  Source  Intent  to  donate  to  ASF  

Available  and  Fully  Supported  Hiveserver2  &  Impala  1.1  ini3ally

AuthorizaEon  module  for  Hive  &  Impala

Page 9: Sentry - An Introduction

Key  Benefits  of  Sentry

!9

Store  Sensi3ve  Data  in  Hadoop  

Extend  Hadoop  to  More  Users  

Enable  New  Use  Cases  

Enable  Mul3-­‐User  Applica3ons  

Comply  with  Regula3ons

Page 10: Sentry - An Introduction

Key  Capabili/es  of  Sentry

!10

Fine-­‐Grained  Authoriza3on  Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS  

Role-­‐Based  Authoriza3on  SELECT  privilege  on  views  &  tables    INSERT  privilege  on  tables  TRANSFORM  privilege  on  servers  ALL  privilege  on  the  server,  databases,  tables  &  views  ALL  privilege  is  needed  to  create/modify  schema  

Mul3-­‐Tenant  Administra3on  Separate  policies  for  each  database/schema  Can  be  maintained  by  separate  admins

Page 11: Sentry - An Introduction

Apache  Ecosystem  and  Sentry

Shared  Hive  Metastore  (with  HCatalog)  

Extensibility  plug-­‐in  for  HiveServer2  

Inline  support  in  Impala  1.1  

Poten3al  extension  to  Pig,  MapReduce,  REST

Possible  future  development

!11

HCatalog  

SentryHive  Metastore

M RE

Page 12: Sentry - An Introduction

Sentry  Architecture

!12

Binding  Layer

Impala

Impala Hive

Policy  Engine

Future

Policy  Provider

File Database

HiveServer2

Authoriza<on  Provider Evalua3on,  Valida3on

Parsing

Interface

Interface

Local  FS/HDFS

Page 13: Sentry - An Introduction

QueryMR

SQL

Query  Execu/on  Flow

!13

Parse

Build

Check

Plan

Sentry

Validate  SQL  grammar

Construct  statement  tree

Validate  statement  objects  • First  check:  Authoriza3on

Forward  to  execu3on  planner

Page 14: Sentry - An Introduction

Example  Security  Policy[databases]

# Defines the location of the per DB policy file for the

# ‘customers’ DB (schema)

customers = hdfs://ha-nn-uri/etc/access/customers.ini

![groups]

# Assigns Hadoop groups to their respective set of roles

manager = analyst_role, junior_analyst_role

analyst = analyst_role

jranalyst = junior_analyst_role

customers_admin = customers_admin_role

admin = admin_role

![roles]

# Roles that can import or export data to the the URIs defined,

# i.e. a landing zone. Since the server runs as the user "hive,"

# files in this directory must either have the “hive” group set

# with read/write or be set world read/write.

analyst_role = server=server1->db=analyst1, \

server=server1->db=jranalyst1->table=*->action=select \

server=server1->uri=hdfs://ha-nn-uri/landing/analyst1

(Continued on next column)

!# Role controls everything for the ‘customers’ DB on server1.

!

junior_analyst_role = server=server1->db=jranalyst1, \

server=server1->uri=hdfs://ha-nn-uri/landing/jranalyst1

!# Privileges for ‘customers’ can be defined in the global policy

# file even though ‘customers’ has its only policy file.

# Note that the privileges from both the global policy file and

# the per-db policy file are merged. There is no overriding.

customers_admin_role = server=server1->db=customers

!# Role controls everything on server1.

admin_role = server=server1

!14

Page 15: Sentry - An Introduction

Live  Demo  &  Give  Aways

!15

Closes  gap  between  HDFS  and  Metastore  

Easy  to  implement  

RFC  2307  compilant  (Kerberos)  

Enable  Mul3-­‐User  Applica3ons  in  one  Hive  WH  

Enables  Mul3  Tendency  per  Row  and  Column  

Page 16: Sentry - An Introduction

About

���16

[email protected] [email protected]

@mapredit mapredit.blogspot.com

!

Web: http://wiki.apache.org/incubator/SentryProposal

Page 17: Sentry - An Introduction