Conservation for Whom? Elephant Conservation and Elephant ...
Hadoop Elephant in Active Directory Forest
Transcript of Hadoop Elephant in Active Directory Forest
Hadoop Elephant in Active Directory Forest
Marek Gawiński, Arkadiusz OsińskiAllegro Group
Agenda
● Goals and motivations● Technology stack● Architecture evolution● Automation integrating new servers● Making AD users and groups visible to Linux● Making architecture non-vulnerable to AD
service inaccessibility● Auto-deployment clients software on
desktops
Allegro Hadoop cluster in numbers
4 terabytes RAM2 petabytes disk space47 datanodes79 projects612 users
Goals and motivations
● Secured cluster● Central authentication and authorisation ● Compliance for real and project users and
groups● Cluster resources available from desktop● Integrating new servers automatically● Making whole architecture non-vulnerable
for failures or timeouts to AD● Auto-deployment and autoconfiguration of
Hadoop clients’ software on users desktops
Technology stack
● Cloudera CDH5● MIT Kerberos● Microsoft Active Directory● FreeIPA● sssd● puppet● msktutil● Hadoop desktop client
History - FreeIPA+FreeIPA Kerberos
Client
Secured Hadoop cluster
FreeIPA User
Local groups management
Kerberos KDCUser/pass
Kerberos Service Ticket
Che
ck u
ser/p
ass
Internal hadoop credsCheck groups
History - FreeIPA+own Kerberos
Client
Secured Hadoop cluster
FreeIPA User
Local groups managementKerberos Service Ticket
Che
ck u
ser/p
ass
User/pass
Inte
rnal
had
oop
cred
s
Check groups
Kerberos KDC
Kerberos KDC MIT
History - FreeIPA+own Kerberos+AD
Client
Secured Hadoop cluster
FreeIPA User
Local groups management
Kerberos KDC MIT
Kerberos Service Ticket
Che
ck u
ser/p
ass
AD User&Groups
AD KerberosChe
ck u
ser/p
ass
User/pass
Internal hadoop credsCheck groups
Check groupsUser/pass
Final - own Kerberos+AD
Client
Secured Hadoop cluster
Kerberos Service Ticket
AD User&Groups
AD KerberosChe
ck u
ser/p
ass
Kerberos KDC MIT
Internal hadoop creds
Check groupsUser/pass
Integrating new Linux servers automatically with AD
AD User&Groups
AD Kerberos
Msktutil
Kerberos keytab
Create user
Create principal
Integrating new Linux servers automatically with AD
define get_ad_keytab ( $path = '', ...) { ... $realm = 'SOME_REALM' $pass = hiera('hadoop_prod/ad/krb_manager_pass') $principal = "${title}/${host}@${realm}" $command = "echo ${pass} | kinit _hadoop_manager@${realm}; \ /usr/local/bin/add_ad_princ.sh ${title} ${host} ${path}; kdestroy" ...
msktutil -c -s $PRINCIPAL --upn $PRINCIPAL -k $KEYTAB \ --computer-name $COMPUTER_NAME \ --server $SERVER_KRB \ --realm $REALM \ -b $USER_LDAP_ROOT \ --dont-expire-password \ --description "\"$DESCRIPTION\"" \ --user-creds-only
Integrating new Linux servers automatically with AD
root@nn1:~# klist -ketKeytab name: FILE:/etc/krb5.keytabKVNO Timestamp Principal---- ------------------- ------------------------------------------------------ 1 08/17/2015 13:26:45 host/[email protected] (aes256-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (aes128-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (des3-cbc-sha1) 1 08/17/2015 13:26:45 host/[email protected] (arcfour-hmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia128-cts-cmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia256-cts-cmac) 4 08/17/2015 13:30:23 [email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 [email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 [email protected] (aes256-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 host/[email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (aes256-cts-hmac-sha1-96)
Integrating new Linux servers automatically with AD
Separated Subtree in AD structure
System Security Services Daemon
● Identity and authentication● Multiple providers (FreeIPA, LDAP, AD)● High availability for backends● Provides PAM and NSS modules● Caching● > 1.11.x - stable support for AD forest auth
System Security Services Daemon
AD schema with no modifications
/etc/sssd/sssd.conf
[domain/AD.REALM]id_provider = adad_server = h1, h2, h3ad_backup_server = hb1, hb2, hb3auth_provider = adchpass_provider = adaccess_provider = adenumerate = Falsekrb5_realm = AD.REALMldap_schema = adldap_id_mapping = Truecache_credentials = Trueldap_access_order = expireldap_account_expire_policy = adldap_force_upper_case_realm = truefallback_homedir = /home/AD.REALM/%udefault_shell = /bin/falseldap_referrals = false
root@nn1:~# id _hc_tech_prod |tr "," "\n"uid=1827653611(_hc_tech_prod)gid=1827600513(domain users)groups=1827600513(domain users)1827652945(_gr_hc_users_common)1827647474(_gr_hc_hadoop_prod)1827652940(_gr_hc_project1_prod)1827652919(_gr_hc_project2_prod)
Making whole architecture non-vulnerable for failures
/etc/sssd/sssd.conf
[nss]memcache_timeout = 3600
Local filesystem nss cache
Active Closest DC
Fallback servers in Remote DC
Auto-deployment and autoconfiguration on desktops
● Install script for Hadoop Client on desktops● Refresh configs with currently prod environment● Support for HDFS/YARN/Hive/Spark
[marek.gawinski:~/ALLEHADOOP] $ sh env.shPassword for [email protected]: **************
[marek.gawinski:~/ALLEHADOOP] $ klistTicket cache: FILE:/tmp/krb5cc_1511317717Default principal: [email protected]
Valid starting Expires Service principal09/04/15 23:31:35 09/05/15 09:31:35 krbtgt/[email protected]
renew until 09/11/15 23:31:33
Auto-deployment and autoconfiguration on desktops
[marek.gawinski:~/ALLEHADOOP] $ hivehive (default)> show databases;OKdatabase_nametpch_benchmarks...xwing_pocTime taken: 0.816 seconds, Fetched: 72 row(s)hive (default)> set hive.execution.engine = tez;hive (default)> select count(*) from table1;
[marek.gawinski:~/ALLEHADOOP] $ hdfs dfs -lsFound 8 itemsdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-06 02:00 .Trashdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-28 21:01 .hiveJarsdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-09 10:43 .sparkStagingdrwx------ - marek.gawinski hadoop 0 2015-05-22 02:35 .stagingdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-31 13:11 oozie1-rw-r--r-- 3 marek.gawinski hadoop 43 2015-05-26 15:26 ozzietest1.hql-rw-r--r-- 3 marek.gawinski hadoop 13 2015-08-31 12:30 pwd.txtdrwxr-xr-x - marek.gawinski hadoop 0 2015-04-16 16:21 tables
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Benefits
● One standard for access control to all company resources
● Every new employee automatically can play with Hadoop with no additional effort
● One password to all systems
Thank you!
Questions?