Presentation Title Goes Here with a Maximum of Three Lines of Copy
Curb Your Insecurity with HDPTips for a Secure Cluster (with Spark too)
Hadoop Summit San JoseJune 29th, 2016
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks: Powering the Future of Data1
Pardeep KumarSr. Systems Architect, NA Prof. Services
4+ years in Hadoop Helping Fortune500 customers succeed in their Hadoop journey Setup, implement, migrate and secure some of the largest clusters in North America Security, & Migration SME, HCC Guru Loves Hadoop, Cricket and Kerberos ;)
@hadooptutor
linkedin.com/in/pardeepkumarmishraAncil McBarnettSr. Solutions Engineer, NorthEast
Helping organizations design, implement, operate and consume Hadoop and Big Data Solutions. Specialize in Security and Hive Tuning. HCC Guru.
Loves Cricket, and DJ Bravo Champion :D
@mcbkingdom
linkedin.com/in/mcbkingdom
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hadoop Security in 4 Steps
# Hortonworks Inc. 2011 2016. All Rights ReservedHow do I set policy across the entire cluster?Who am I/prove it?What can I do?What did I do?How can I encrypt at rest and over the wire?Comprehensive Approach to SecurityData ProtectionProtect data at rest and in motionIn order to protect any data system you must implement the following:AuditMaintain a record of data accessAuthorizationProvision access to dataAuthenticationAuthenticate users and systemsAdministrationCentral management and consistent security
# Hortonworks Inc. 2011 2016. All Rights Reserved
4
HDP Security: Comprehensive, Complete, Extensible
Perimeter Level SecurityNetwork Security (i.e. Firewalls)Apache Knox (i.e. Gateways)
AuthenticationLDAP/ AD - Kerberos
Data ProtectionEncrypts data in motion and data at rest; refer partner encryption solutions for broader needs: HDFS TDE with Ranger KMS
Authorization & AuditConsistent authorization controls across all Apache components within HDP: Apache Ranger
# Hortonworks Inc. 2011 2016. All Rights ReservedAuthentication with KerberosKerberos is necessary evil, just do it!!
# Hortonworks Inc. 2011 2016. All Rights Reserved
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks: Powering the Future of Data6
Security Without Kerberos
# Hortonworks Inc. 2011 2016. All Rights ReservedConfigure Kerberos Ambari Wizard
# Hortonworks Inc. 2011 2016. All Rights ReservedSecurity With Kerberos
# Hortonworks Inc. 2011 2016. All Rights ReservedApache Ranger
# Hortonworks Inc. 2011 2016. All Rights Reserved
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks: Powering the Future of Data10
Apache Ranger
# Hortonworks Inc. 2011 2016. All Rights ReservedHDFS File Security
# Hortonworks Inc. 2011 2016. All Rights ReservedHive Database and Table Security
# Hortonworks Inc. 2011 2016. All Rights Reserved
Authorization and AuditAuthorizationFine grain access controlHDFS Folder, FileHive Database, Table, ColumnHBase Table, Column Family, ColumnStorm, Knox and more
AuditExtensive user access auditing in HDFS, Hive and HBaseIP AddressResource type/ resourceTimestampAccess granted or denied
Control access into systemFlexibility in defining policies
# Hortonworks Inc. 2011 2016. All Rights ReservedRest API Security with Apache Knox
# Hortonworks Inc. 2011 2016. All Rights Reserved
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks: Powering the Future of Data15
Hadoop REST APIsUseful for connecting to Hadoop from the outside the clusterWhen more client language flexibility is requiredi.e. Java binding not an optionChallengesClient must have knowledge of cluster topologyRequired to open ports (and in some cases, on every host) outside the clusterServiceAPIWebHDFSSupports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming.WebHCatJob control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat.HiveHive REST API operationsHBaseHBase REST API operationsOozieJob submission and management, and Oozie administration.
# Hortonworks Inc. 2011 2016. All Rights ReservedAuthenticationAPI Security with Knox
Eliminates SSH edge nodeCentral API management Central audit control Service level authorization
SSO IntegrationSiteminder and OAMLDAP and AD integration
Incubated and led by Hortonworks, Apache Knox extends the reach of Hadoop REST API without Kerberos complexitiesIntegrated with existing systems to simplify identity maintenanceSingle, simple point of access for a clusterCentral controls ensure consistency across one or more clustersKerberos EncapsulationSingle Hadoop access pointREST API hierarchyConsolidated API callsMulti-cluster support
# Hortonworks Inc. 2011 2016. All Rights Reserved
17
Hadoop REST API with KnoxServiceDirect URLKnox URLWebHDFShttp://namenode-host:50070/webhdfshttps://knox-host:8443/webhdfsWebHCathttp://webhcat-host:50111/templetonhttps://knox-host:8443/templeton
Ooziehttp://ooziehost:11000/ooziehttps://knox-host:8443/oozie
HBasehttp://hbasehost:60080https://knox-host:8443/hbase
Hivehttp://hivehost:10001/cliservicehttps://knox-host:8443/hiveYARNhttp://yarn-host:yarn-port/wshttps://knox-host:8443/resourcemanager
Masters could be on many different hostsOne hosts, one portConsistent pathsSSL config at one host
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hadoop REST API Security: Drill-DownRESTClientEnterpriseIdentityProviderLDAP/AD
Knox GatewayGWGWFirewallFirewallDMZLBEdge Node/Hadoop CLIsRPCHTTPHTTPHTTPLDAPHadoop Cluster 1
MastersSlavesRMNNWebHCatOozie
DNNMHS2Hadoop Cluster 2
MastersSlavesRMNNWebHCatOozie
DNNMHS2HBaseHBase
# Hortonworks Inc. 2011 2016. All Rights ReservedNode the arrows to Hadoop Cluster are simplifications
19
Data Protection
# Hortonworks Inc. 2011 2016. All Rights Reserved
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks: Powering the Future of Data20
Data ProtectionHDP allows you to apply data protection policy at different layers across the Hadoop stackLayerWhat?How ?Storage and AccessEncrypt data while it is at restHDFS Transparent Data Encryption, Partners, Hbase encryption, OS level encrypt, TransmissionEncrypt data as it movesSSL, SASL, RPC
# Hortonworks Inc. 2011 2016. All Rights Reserved
Points of CommunicationPage 22WebHDFSDataTransferProtocol
NodesM/R ShuffleClient124RPC3
NodesDataTransfer2JDBC/ODBC3Hadoop ClusterRPC4
# Hortonworks Inc. 2011 2016. All Rights ReservedData Protection - HDFS Encryption
DATA ACCESS DATA MANAGEMENT
SECURITY PARTNERS YARNKeyProvider API(partner integration point)
Key Management System (KMS)
Stateless Key Management1N1HDFS Encryption Zone Encrypted FileEncrypted FileEncrypted FileEncrypted FileEncrypted FilesName Node
HDFS Client
HDFS Client
Leverage Native HDFS Transparent Data Encryption or commercial ones like Protegrity etc.Hortonworks collaborating with partners to deliver enterprise scale Key Management , deliver more choices to customersOpen source KMS with RangerOr Partner with commercial KMS solutions i.e. Voltage KMSPartner joint engineering resourcesVoltage Stateless Key Management integrated with KeyProvider API Only HDP offers open source and commercial choices for key managementOpen Source Key Management
# Hortonworks Inc. 2011 2016. All Rights Reserved
23
Demo Transparent Data Encryption
# Hortonworks Inc. 2011 2016. All Rights ReservedSecuring Spark Deployments
# Hortonworks Inc. 2011 2016. All Rights Reserved
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks: Powering the Future of Data25
Spark - AuthenticationHadoop Cluster
Spark leverages Kerberos on YARN
KDCUse Spark ST, submit Spark JobSpark gets Namenode (NN) service ticketYARN launches Spark Executors using John Does identityJohn Doe
Spark AMNNExecutor reads from HDFS using John Does delegation tokenkinit1234567Get Service Ticket (ST) for Spark
# Hortonworks Inc. 2011 2016. All Rights ReservedJohn Doe first authenticates to Kerberos before launching Spark Shell
kinit -kt /etc/security/keytabs/johndoe.keytab [email protected]
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10
HDFSSpark Authorization
YARN Cluster
ABC
KDCUse Spark ST, submit Spark JobGet Namenode (NN) service ticketExecutors read from HDFSClient gets service ticket for SparkJohn Doe
RangerCan John launch this job?Can John read this file
# Hortonworks Inc. 2011 2016. All Rights ReservedControlling HDFS Authorization is easy/DoneControlling Hive row/column level authorization in Spark is WIP
Spark Channel Encryption - ExampleShuffle DataControl/RPCShuffleBlockTransferRead/Write DataFS Broadcast,File Downloadspark.authenticate.enableSaslEncryption= truespark.authenticate = true. Leverage YARN to distribute keysDepends on Data Source, For HDFS RPC (RC4 | 3DES) or SSL for WebHDFSNM > Ex leverages YARN based SSLspark.ssl.enabled = true
# Hortonworks Inc. 2011 2016. All Rights ReservedGotchas with Spark SecurityClient -> Spark Thrift Server > Spark Executors No identity propagation on 2nd hopForces STS to run as Hive user to read all dataReduces securityUse SparkSQL via shell or programmatic APIhttps://issues.apache.org/jira/browse/SPARK-5159SparkSQL Granular security unavailableRanger integration will solve this problem (Refer to talk in Room 210A for Security in Spark and Hive)Brings Row/Column level/Masking features to SparkSQLSpark + HBase with KerberosIssue fixed in Spark 1.4 (Spark-6918)Spark Stream + Kafka + Kerberos + SSLIssues fixed in HDP 2.4.xSpark jobs > 72 HoursKerberos token not renewed, fixed in Spark 1.5+
# Hortonworks Inc. 2011 2016. All Rights ReservedQuestions??
# Hortonworks Inc. 2011 2016. All Rights Reserved
# Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks: Powering the Future of Data30