Taking Hadoop to Enterprise Security Standards
-
Upload
hadoopsummit -
Category
Technology
-
view
105 -
download
1
description
Transcript of Taking Hadoop to Enterprise Security Standards
©2014 LinkedIn Corporation. All Rights Reserved.
Taking Hadoop to Enterprise Security StandardsKarthik Ramasamy
Harsh Singhal
Arvind Mani
Access Control
How many of you need or have access control in Hadoop?
©2014 LinkedIn Corporation. All Rights Reserved.
Users First Internal Threat
Keeping Data Secure
External Threat
More granular the access controls are more people can have access to
the data
©2014 LinkedIn Corporation. All Rights Reserved.
Hadoop – Status Quo
Multiple Query Execution Engines
Custom Code Execution
Auditing
©2014 LinkedIn Corporation. All Rights Reserved.
User ID Email Address IP address Billing address
Security Customer Service Data Scientist
Adding & Removing group membership can take up to few hours
HDFS file permissions are very coarse (at file level)
HDFS File Permissions
©2014 LinkedIn Corporation. All Rights Reserved.
Other Access Control Solutions
©2014 LinkedIn Corporation. All Rights Reserved.
Mixed Data Multiple Data Processing Systems
Data for Everyone
Challenges
©2014 LinkedIn Corporation. All Rights Reserved.
Extensible
Authorization
Fine Grain Control
Fast Changes to Authorization
Rules
What do we need?
©2014 LinkedIn Corporation. All Rights Reserved.
Our Solution: Access Control via Encryption
Apache Kafka
HDFS
Event name
Symm
etric Encryption Key
Key Server
Parq
uet
ETLEncrypted Events
©2014 LinkedIn Corporation. All Rights Reserved.
User A’s Job
User B’s Job
User C’s Job
Producer Job
ETL User
Parquet File
User Columns
A 5
B 2, 5
Key Server
Access Control via Encryption
©2014 LinkedIn Corporation. All Rights Reserved.
Columnar Storage
Page 0
Page 1
Page 2
Column a Column b
Row
gro
up
Parquet Format
Brief Overview of Parquet
©2014 LinkedIn Corporation. All Rights Reserved. *Yet to be integrated into open source Parquet
Field mode
Page
Column
| Page Mode | Hybrid Mode
Encryption Support in Parquet*
©2014 LinkedIn Corporation. All Rights Reserved.
Examples Emails – Analysts need it to join with other tables but may not require
access to individual emails
N Values (Page)
Encrypt each value at a time
xxxxxxx
yyyyyyy
yyyyyyy
zzzzzzz
Field Mode
©2014 LinkedIn Corporation. All Rights Reserved.
Field Mode
Joins Counts Distribution Analysis
No/Low compression
©2014 LinkedIn Corporation. All Rights Reserved.
Page Mode
No information is leaked except entropy of the data Better performance than other modes
N Values (Page)
Encode Compress Encrypt
©2014 LinkedIn Corporation. All Rights Reserved.
Hybrid Mode
More fine grain control of information Increase in overhead due to double encryption/decryption
N Values (Page)
Encrypt each value Encrypt
©2014 LinkedIn Corporation. All Rights Reserved.
Plain Text | Encrypted Value |No Access
Field Mode Page Mode
Hybrid Mode
©2014 LinkedIn Corporation. All Rights Reserved.
Key Versioning
Each key is versioned and specific for a source (File/Event name) Reduces the exposure incase of key leakage Time based access control
– All users by default can access only last 30 days of data– Give users access to data in specific time period
Authentication of producers can be done separately
©2014 LinkedIn Corporation. All Rights Reserved.
Better Auditing Coverage
Retention Enforcement
Key Server Features
Multifactor Authentication
©2014 LinkedIn Corporation. All Rights Reserved.
PIG Usage
Thank you!