Securing Hadoop in an Enterprise Context

20
Hadoop Summit 2016 Securing Hadoop in an Enterprise Context Hellmar Becker, DevOps Engineer Dublin, April 14, 2016

Transcript of Securing Hadoop in an Enterprise Context

Page 1: Securing Hadoop in an Enterprise Context

Hadoop Summit 2016

Securing Hadoop in an Enterprise Context

Hellmar Becker, DevOps EngineerDublin, April 14, 2016

Page 2: Securing Hadoop in an Enterprise Context

Who am I?

2

Page 3: Securing Hadoop in an Enterprise Context

2

4

3

1

5

The Challenge

Hadoop Usage Patterns

Aspects of Security

Building Blocks for a Security Architecture

Questions

Securing Hadoop in an Enterprise Context

3

Page 4: Securing Hadoop in an Enterprise Context

The Challenge

Page 5: Securing Hadoop in an Enterprise Context

Data Lake and Advanced Analytics within ING

5

External and internal reporting for own or regulatory purposes

Integrate all data sources within the bank into one processing platform

• Batch data streams• Live transactions• Model building for customer

interaction

Better understand customer needs in an increasingly digital world

Data can help us offering tailored products and services

Empower data scientists and analyststo get the best results with advancedanalytics tools and predictive models

Open source software where possible – Hadoop as a core component

Page 6: Securing Hadoop in an Enterprise Context

6

Possible consequences• Legal consequences• Loss of reputation• Financial loss

Risks• Data loss

• Privacy breach• System intrusion

Page 7: Securing Hadoop in an Enterprise Context

Hadoop user model:

• A user name is just an alphanumeric string

• So is a group name• They do not have to match entities in

the OS• Via REST API anybody could read or

modify data

So, the security design has to be actively built!

And this is what we did.

Hadoop "out of the box" default runs without security

7

Page 8: Securing Hadoop in an Enterprise Context

Hadoop Usage Patterns

Page 9: Securing Hadoop in an Enterprise Context

1. File Storage

2. Deep Data

3. AnalyticalHadoop

4. (Real Time)

Hadoop Usage Patterns

9

Page 10: Securing Hadoop in an Enterprise Context

Aspects of Security

Page 11: Securing Hadoop in an Enterprise Context

Aspects of Security

12

Technical: Rings of Defense

• Perimeter Level Security• Application Level Authentication and

Authorization• OS Security• Data Protection

See also: http://www.slideshare.net/vinnies12/hadoop-security-today-tomorrow-apache-knox Conceptual: Five Pillars of Security

• Administration• Authentication• Authorization• Auditing• Data ProtectionSee also: http://hortonworks.com/hdp/security/

Page 12: Securing Hadoop in an Enterprise Context

Building Blocks for a Security Architecture

Page 13: Securing Hadoop in an Enterprise Context

• Firewall around the entire cluster• “Stepping stone” servers• Citrix/Terminal server for interactive

access• Ingestion server with defined transfer

paths

User model• Personal users locally defined or with

corporate directory• Service/Technical users defined locally

Software updates and software development• Through manually maintained mirror

Used in exploratory environments (pattern 3)

Building Blocks: Perimeter Security

14

Page 14: Securing Hadoop in an Enterprise Context

• General goal: Zero Touch deployment• Automatic synchronization with

enterprise directory• UI access is only used for incidents

Administration

15

• Kerberos]• Future: Share a KDC HA cluster among Hadoop instances• Connecting to enterprise directory using trusts and synchronization (next chapter)• Keep the Kerberos principals (Hadoop users) completely separate from OS users

Authentication

Building Blocks: Internal Security

Page 15: Securing Hadoop in an Enterprise Context

Unified rights management with Ranger

• Service principals will be directly made known to Ranger; PA's rights are assigned only based on groups

• Groups and users are synced with Active Directory

• Ranger 0.4 can not take away privileges that were granted on a lower level

• HDFS permissions and ACLs override Ranger• Make sure these access paths are locked down

HDFS ACLs (No!)

• No easy to use GUI• Difficult to maintain overview• Only for HDFS, does not handle other components

Authorization

16

> hdfs dfs -setfacl -m group:execs:r-- /sales-data

> hdfs dfs -getfacl /sales-data# file: /sales-data# owner: bruce# group: salesuser::rw-group::r--group:execs:r--mask::r--other::---

Page 16: Securing Hadoop in an Enterprise Context

• Personal users in corporate Active Directory, NPAs in cluster KDC

• One KDC pair per cluster• One way realm trust• Custom script to synchronize Ranger

What We Have Done: Corporate Integration

17

Challenges• Learning to work in interdisciplinary teams• Organizational boundaries• UNIX – Windows• Infra – Platform DevOpsExample: Ambari service connects to UNIX LDAP rather than AD

OS security and Hadoop security are not integrated• YARN container users• Hadoop ACLs, group mapping• Multitenancy? (Not solved in this picture)

Page 17: Securing Hadoop in an Enterprise Context

• Ranger's uxugsync process queries Active Directory through LDAP protocol

• Ranger 0.4: Reads all users, then determines their group affiliation• More than 50,000 employees in ING Group• Need to limit the load on LDAP server!

• Ranger 0.5: Group driven query - still not optimal because it uses attribute filters

• Most efficient LDAP query is either by a single DN (Distinguished Name), or by container (query base DN).

• But we cannot use containers because of enterprise policy• Solution: custom Python script that queries LDAP hierarchically• One “supergroup” is picked by DN• The members of the “supergroup” are all LDAP groups that have

Hadoop related privileges• Query all these groups, again by DN• Examine the members of each group (personal users)• Make the user-group relationships known to Ranger via REST call

Working Around Ranger’s Limitations

18

Ranger User-Group API is not documented and supported

Database schema: creates duplicate records, inconsistent deletion behavior

OS integration should be better

Page 18: Securing Hadoop in an Enterprise Context

• IPA and sssd provide user/group mapping on Hadoop and OS level

• Role based access for personal users, managed through a central tool

• One user database for Hadoop services, Ambari, Ranger

• YARN, HDFS user models fall nicely into place

• Requires ING patches (HDP 2.4, Ranger 0.6)• RANGER-827 use getent instead of

files• RANGER-842 use pam for Ranger

auth• HADOOP-12751, HIVE-4413 support

‘@’ in user name• AMBARI-6432 support IPA KDC

A Better Approach: Corporate Directory Integration

19

Timelines!We need this prioritized by our vendor

Page 19: Securing Hadoop in an Enterprise Context

Questions

Page 20: Securing Hadoop in an Enterprise Context

• Hellmar in Nîmes / With Python in Mindanao, by the author• Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0• Data Pipeline, ING OIB Image Bank• Storm surge by David Baird is licensed under CC BY-SA 2.0; cropped by me• Scared Girl by Victor Bezrukov - Port-42 is licensed under CC BY 2.0• System Lock by Yuri Samoilov is licensed under CC BY 2.0; cropped by me• Safe by Rob Pongsajapan is licensed under CC BY 2.0; cropped by me• Hercules and Cerberus by The Los Angeles County Museum of Art is

Public Domain

Attributions

21