Cloudera training: secure your Cloudera cluster

Post on 21-Jan-2018

354 views 6 download

Transcript of Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider.

Job Trends from Indeed.com

What are you most concerned about when it comes to your readiness for big data and hadoop?

Cloudera MDP webinar poll results, July 2016

© Cloudera, Inc. All rights reserved.

Why Cloudera training?Aligned to best practices and the pace of change

1Broadest range of coursesLearning paths for Developer, Admin, Analyst

2Most experienced instructorsMore than 40,000 trained since 2009

6Widest geographic coverageMost classes offered: 50 cities worldwide plus online

7Most relevant platform & communityCDH deployed more than all other distributions combined

3Leader in certificationOver 12,000 accredited Cloudera professionals

Trusted source for training100,000+ people have attended online courses4

8Depth of training materialHands-on labs and VMs support live instruction

9Ongoing learningVideo tutorials and e-learning complement training

State of the art curriculumCourses updated as Hadoop evolves5 10 Commitment to big data education

University partnerships to teach Hadoop in colleges

© Cloudera, Inc. All rights reserved.

Creating leaders in the fieldTraining enables Big Data solutions and innovation

94%

66%

Would recommend or highly recommend Cloudera training to friends or colleagues

Draw on lessons from Cloudera training on at least a monthly basis

40% Develop new apps or perform business-critical analyses as a result of training alone

Sources: Cloudera Past Public Training Participant Study, December 2012.

Cloudera Customer Satisfaction Study, January 2013.

88% Indicate Cloudera training provided the Hadoopexpertise their roles require

© Cloudera, Inc. All rights reserved.

What is available from Cloudera University?

• Private training: Course delivered at location of customer choice to internal audience

• Public training: Courses regularly scheduled around the globe. Schedule available on web

• Virtual training: Live training accessed via the internet; available for public and private courses

• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options

• Certification: Rigorously developed and meaningful bodies of knowledge

OnDemand Virtual live classroom Private onsitePublic live classroom

© Cloudera, Inc. All rights reserved.

Suggested Cloudera University curricula

Developers

• Python/Scala Training

• Developer for Spark and Hadoop

• CCA: Spark and Hadoop

Developer

• Spark ML & Kafka modules

• Topic specific training (Search,

HBase)

• Hands on practice

• CCP: Data Engineer

Administrators

• Cloudera Administration training

• CCA: Administrator

• Cloudera Security OnDemand

Data Analysts/Data Scientists

• Data Analyst: Using Hive, Pig & Impala

• CCA: Data Analyst

• Cloudera Data Science

7© Cloudera, Inc. All rights reserved.

Security for Hadoop

Carlo Lazzaris | Technical Instructor

8© Cloudera, Inc. All rights reserved.

Security Webinar Agenda

1. The need for Hadoop Security

Hacker news and legal regulations

2. Cloudera Security Implementation

Five levels of security

3. How to secure your Cloudera cluster

Cloudera Documentation

Cloudera professional services

Cloudera OnDemand security course

9© Cloudera, Inc. All rights reserved.

The need for Hadoop security

10© Cloudera, Inc. All rights reserved.

Unguarded data stores are the victims

11© Cloudera, Inc. All rights reserved.

Regulatory Compliance

Organizations can be fined up to 4% of annual global turnover for breaching GDPR

or €20 Million

12© Cloudera, Inc. All rights reserved.

Cloudera security implementation

13© Cloudera, Inc. All rights reserved.

Cloudera Enterprise CDH

13

The modern platform for machine learning and analytics optimized for the cloud

EXTENSIBLE SERVICES

CORE SERVICESDATA

ENGINEERINGOPERATIONAL

DATABASEANALYTIC DATABASE

DATA CATALOG

INGEST & REPLICATION

SECURITY GOVERNANCEWORKLOAD

MANAGEMENT

DATA SCIENCE

S3 ADLS HDFS KUDUSTORAGESERVICES

14© Cloudera, Inc. All rights reserved.

• Unified security – protects sensitive data with consistent

controls, even for transient and recurring workloads

• Consistent governance – enables secure self-service access

to all relevant data and increases compliance

• Easy workload management – increases user productivity and

boosts job predictability

• Flexible ingest and replication – aggregates a single copy of

all data, provides disaster recovery, and eases migration

• Shared catalog – defines and preserves structure and

business context of data for new applications and partner

solutions

Open platform servicesBuilt for multi-function analytics | Optimized for cloud

15© Cloudera, Inc. All rights reserved.

Cloudera Enterprise-Grade Security and Governance

Access

Defining what

users and

applications can

do with data

Technical Concepts:

Permissions

Authorization

Data

Protection

Shielding data in

the cluster from

unauthorized

visibility

Technical Concepts:

Encryption at rest & in

motion

Visibility

Reporting on

where data came

from and how it’s

being used

Technical Concepts:

Auditing

Lineage

Cloudera Manager Apache Sentry Cloudera NavigatorNavigator Encrypt &

Key Trustee

Identity

Validate users by

membership in

enterprise

directory

Technical

Concepts:Authentication

User/group mapping

16© Cloudera, Inc. All rights reserved.

Cloudera Certified Technology Partners

Data Sources Data IngestProcess, Refine

& PrepData Discovery Advanced Analytics

Connected Machines/Data sources

Other Data Sources

17© Cloudera, Inc. All rights reserved.

A certified product ensures it integrates securely

• Authenticate via Kerberos or LDAP

Authentication

• Handle Apache Sentry with Hive, Impala, Search, HDFS

Authorization

• Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption

Encryption

18© Cloudera, Inc. All rights reserved.

Vulnerability Response and Process

Vulnerability reports

Upstream

Internal

External

Fix Publish

19© Cloudera, Inc. All rights reserved.

Cluster Security Levels

20© Cloudera, Inc. All rights reserved.

Cloudera Enterprise

20

The modern platform for machine learning and analytics optimized for the cloud

21© Cloudera, Inc. All rights reserved.

Enterprise Encryption Performance

23© Cloudera, Inc. All rights reserved.

Disclaimer

This talk serves as a general guideline for

security implementation on Hadoop.

The actual implementation procedures and

scope of implementation vary on a case-by-

case basis, and should be assessed by

Cloudera’s Professional Services team or

certified Cloudera SI Partners.

24© Cloudera, Inc. All rights reserved.

Non-secure #0Data Free for All

25© Cloudera, Inc. All rights reserved.

Firewall

ActiveDirectory/KDC

Hadoop cluster

Cloudera Manager

Gateway node

Cloudera Worker nodesDatacenter

Applications

26© Cloudera, Inc. All rights reserved.

4 modes of Identity Management

1. Simple Authentication2. Kerberos3. LDAP4. SAML

File group ownership• AD integration• SSSD or CentrifyConsideration in large enterprises.

via SSSD

via

27© Cloudera, Inc. All rights reserved.

Simple Authentication detect the user

Firewall

ActiveDirectory

Master

Worker Worker Worker

Cloudera Manager

Master

(SSSD/Centrify)

28© Cloudera, Inc. All rights reserved.

Simple authentication =

no authentication

29© Cloudera, Inc. All rights reserved.

Minimal Security #1

Reduce Risk Exposure

30© Cloudera, Inc. All rights reserved.

How it works: Authentication

• LDAP and SAML authentication options

Web UIs

• LDAP/AD and Kerberos authentication options

SQL Access

•Kerberos authentication

•Automation provided by Cloudera Manager to leverage Active Directory (AD)

Command Lines

User authenticates to AD or KDC

Authenticated user gets Kerberos Ticket

Ticket grants access to Services e.g. Impala

User [ssmith]

Password [***** ]

31© Cloudera, Inc. All rights reserved.

Kerberos

EXAMPLE.COM

KDC

user@EXAMPLE.COM

Hadoop

user@EXAMPLE.COM

user

Strong Authentication

KDC Key Distribution Center

• MIT

• ActiveDirectory (more common)

realmprimary

32© Cloudera, Inc. All rights reserved.

Kerberos

Consideration in large corporates

Time synchronization

CM Kerberos Wizard

• Configure AD to create a Kerberos

principal for CM server, and to

delegate CM the ability to

create/manage Kerberos

principals

33© Cloudera, Inc. All rights reserved.

Kerberos

Consideration in large corporates

Time synchronization

CM Kerberos Wizard

• Configure AD to create a Kerberos

principal for CM server, and to

delegate CM the ability to

create/manage Kerberos

principals

34© Cloudera, Inc. All rights reserved.

Kerberos Authentication

* LDAP over SSL

35© Cloudera, Inc. All rights reserved.

Authorization/Access Control

HDFS File ACL YARN job submission

Hbase ACLs Oozie ACL

Access Control List (ACLs)

Hive

Sentry Managed

(RBAC)

Impala

36© Cloudera, Inc. All rights reserved.

Auditing

37© Cloudera, Inc. All rights reserved.

Backup/Disaster Recovery

Cloudera Backup/Disaster Recovery (BDR)

• A high performance data replicator

• Copies incremental data on the source cluster at specified schedules

Supports

Kerberos

Data encryption

HDFS replication to cloud

38© Cloudera, Inc. All rights reserved.

Kerberized BDR Best Practice

Production DR

Cloudera BDRPROD.EXAMPLE.COM

Cross-realm trustKDC KDC

DR.EXAMPLE.COM

39© Cloudera, Inc. All rights reserved.

More Security #2

Managed, Secure, Protected

40© Cloudera, Inc. All rights reserved.

Data In-Motion Encryption

RPC encryption

Data transport encryption

• Supports AES CTR, up to 256-bit

key length

HTTP TLS/SSL encryption

• No self-signed certificates in

production

Master

Worker Worker Worker

Master

Application

RPC encryption

Transport encryption

TLS/SSL

41© Cloudera, Inc. All rights reserved.

Data At-Rest Encryption

Transparent encryption

Supports any Hadoop applications

Encryption Zone

$ hadoop key create mykey

$ hadoop fs -mkdir /zone

$ hdfs crypto -createZone -keyName mykey -path /zone

/

/tmp /zone

foo bar

Encryption zone

42© Cloudera, Inc. All rights reserved.

Key Management Server Deployment (non-prod)

HDFS NameNode

Client

Java Keystore

KMSKeystore file

Separation of duties

• Encryption Zone Key (EZK) is stored in

KMS server

• HDFS super user can not decrypt files

43© Cloudera, Inc. All rights reserved.

Key Management Server/Key Trustee Server Deployment

HDFS NameNode

ClientKey Trustee

KMS

Key Trustee KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

(or more)

44© Cloudera, Inc. All rights reserved.

KMS+KTS+HSM Deployment

HDFS NameNode

Client HSM KMS

HSM KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

Key HSM

(or more)

Key HSM

HSM

HSM

45© Cloudera, Inc. All rights reserved.

Troubleshooting: Encryption Performance Anomaly

• Configuration

• AES-NI Hardware acceleration

• OpenSSL library

• Entropy

46© Cloudera, Inc. All rights reserved.

Fine Grained Access Control with Apache Sentry

47© Cloudera, Inc. All rights reserved.

Most Security #3

Secure Data Vault

48© Cloudera, Inc. All rights reserved.

Level 3 Secure Data Vault

• All data, both data-at-rest and data-in-transit is encrypted

• Key management system is fault-tolerant

• Auditing mechanisms comply with industry, government, and regulatory

standards (PCI, HIPAA, NIST, for example)

• Auditing extends from EDH to the other systems that integrate with it.

• Cluster administrators are well-trained

• Security procedures have been certified by an expert

• Cluster can pass technical review

49© Cloudera, Inc. All rights reserved.

Data Redaction

Personal Identifiable Information

• PCI-DSS, HIPAA

Best practices followed

Password

• stores in credential files, not in configuration

Log, queries

• Cloudera Manager

50© Cloudera, Inc. All rights reserved.

Full Encryption

Encrypt Data Spills

• MapReduce

• Impala

• Hive

• Flume

OS-level encryption

• Navigator Encrypt

51© Cloudera, Inc. All rights reserved.

How to secure your Cloudera cluster

52© Cloudera, Inc. All rights reserved.

Cloudera Documentation

53© Cloudera, Inc. All rights reserved.

Cloudera Professional Services security engagement

• Review security requirements and provide an overview of data security policies

• Audit architecture and current systems for security policies and best practices

• Custom tailor a security reference architecture

• Optimize OS and Java to take advantage of hardware-based crypto-acceleration

• Install and configure Kerberos with MIT Kerberos KDC or Active Directory

• Install and configure Sentry and Cloudera Navigator (license required)

• Install and configure Navigator Encrypt and Key Trustee with an HSM root of trust

• Review fine-grain permissions on sample data using Sentry

• Review audit and lineage on sample data using Navigator

• Use Cloudera Manager and Hue to review security integration for users

• Enable and configure HDFS encryption

https://www.cloudera.com/more/services-and-support/professional-services/security-integration-pilot.html

54© Cloudera, Inc. All rights reserved.

Cloudera online ondemand security course

• Online self paced training course https://ondemand.cloudera.com

• Launch planned for mid Feb 2018

• 3 days estimate worth of content at Cloudera level 1 and 2 security level

• Currently 375~ slides with 9 detailed chapters and 16 instructor demonstrations :

1. Security overview

2. Security Architecture

3. Host Security

4. Encrypting Data in motion

5. Authentication

6. Authorization

7. Encrypting Data at Rest

8. Auditing

9. Additional Considerations: Data Governance

55© Cloudera, Inc. All rights reserved.

Ondemand security course instructor guided demos

1. Potential Attack vectors

2. Securing the cluster hosts

3. Generating and managing keys for TLS

4. Configuring Cloudera Manager for TLS

5. Encrypting Data in Motion

6. Hadoop default authentication

7. Kerberizing Cluster with MIT Kerberos

8. Kerberizing Cluster with Active Directory

9. Configuring Authorising with Cloudera

Manager

10. Controlling access to Yarn

11. Controlling access to HDFS

12. Controlling access to Tables

13. Enabling HDFS Encryption

14. Protecting local data with NavEncrypt

15. Using Navigator for auditing

16. Reassessing cluster security

56© Cloudera, Inc. All rights reserved.

Ondemand security course disclaimer

THIS IS REALLY IMPORTANT:

The examples in this course are based on CM/CDH 5.12, running in a cloud-based deployment on a

cluster using the CentOS 7.2 operating system.

Given the almost limitless permutations of possible configurations, including different versions of CDH,

Cloudera Manager, operating systems, directory servers, Kerberos servers, web browsers, and other

tools, as well as variations in policies, laws, and practices that affect each organization differently, it's

impossible for a training course to cover all aspects of security.

This course is meant to provide a background that will help you to understand many important concepts

and techniques, but is not intended as a replacement for the relevant documentation or a consulting

engagement with an expert who can provide advice based on your specific requirements.

• Disclaimers ~ due to security variety and permutations

• Versions used: CDH 5.12 and Centos 7.2

57© Cloudera, Inc. All rights reserved.

Ondemand security course scenario

• Many of our demonstrations are based on a hypothetical scenario

• However, the concepts should apply to nearly any organization

• Loudacre Mobile is a fast-growing wireless carrier

• Employees serving in a variety of roles

• Data ingested from many sources, in many formats

• Data processed by many tools

58© Cloudera, Inc. All rights reserved.

Ondemand security course environment

59© Cloudera, Inc. All rights reserved.

Comprehensive demonstration cluster

60© Cloudera, Inc. All rights reserved.

Sample chapter structure: Encrypting Data in Motion

• Encryption Fundamentals

• Certificates

• Key Management

Instructor-Led Demonstration: Generating and Managing Keys for TLS

• Configuring Cloudera Manager for TLS

Instructor-Led Demonstration: Configuring Cloudera Manager for TLS

• Encrypting Hadoop’s Data in Motion

Instructor-Led Demonstration: Encrypting Hadoop’s Data in Motion

• Essential Points

61© Cloudera, Inc. All rights reserved.

Register your interest forOnDemand security course:

peter.rizvi@cloudera.com

© Cloudera, Inc. All rights reserved.

Thank you