April 2014 HUG : Integrating HUE with Multi-tenant cluster

50
INTEGRATE HUE WITH YOUR HADOOP CLUSTER Romain Rigaux Y! HUG Apr 16, 2014

description

April 2014 HUG : Integrating HUE with Multi-tenant cluster

Transcript of April 2014 HUG : Integrating HUE with Multi-tenant cluster

Page 1: April 2014 HUG : Integrating HUE with Multi-tenant cluster

INTEGRATE HUE WITH YOUR HADOOP CLUSTERRomain RigauxY! HUG Apr 16, 2014

Page 2: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHATIS HUE?

WEB INTERFACE FOR MAKING HADOOP EASIER TO USE Suite of apps for each Hadoop component, like Hive, Pig, Impala, Oozie, Solr, Sqoop2, HBase...

Page 3: April 2014 HUG : Integrating HUE with Multi-tenant cluster

VIEW FROM30K FEET

Hadoop Web Server You and eventhat friend

that uses IE9 ;)

Page 4: April 2014 HUG : Integrating HUE with Multi-tenant cluster

YARN JobTracker Oozie

Pig

HDFS

HiveServer2

Hive Metastore

Cloudera Impala

Solr

HBase

Sqoop2

Zookeeper

LDAP SAML

Hue Plugins

ECOSYSTEMAND APPS

Page 5: April 2014 HUG : Integrating HUE with Multi-tenant cluster

TARGETOF HUE

GETTING STARTED WITH HADOOP BEING PRODUCTIVE EXPLORING DIFFERENT ANGLES OF THE PLATFORM !

LET ANY USER FOCUS ON BIG DATA PROCESSINGBEING COMPATIBLE WITH ANY HADOOP VERSION (0.20/1.2.0/2.3.0)

Page 6: April 2014 HUG : Integrating HUE with Multi-tenant cluster

OPEN SOURCE

~3000 COMMITS 33 CONTRIBUTORS648 STARS212 FORKS !

github.com/cloudera/hue

Page 7: April 2014 HUG : Integrating HUE with Multi-tenant cluster

THE CORETEAM PLAYERS

team.gethue.com

ABRAHAM ELMAHREK

ROMAIN RIGAUX

ENRICO BERTI

CHANG BEER

Page 8: April 2014 HUG : Integrating HUE with Multi-tenant cluster

TALKS

Meetups and events in NYC, Paris, LA, Tokyo, SF, Stockholm, Vienna, San Jose, Singapore…Coming up in London, West coast

AROUNDTHE WORLD

RETREATS

Nov 13 Koh Chang, Thailand May 14 Curaçao, Netherlands Antilles

Page 9: April 2014 HUG : Integrating HUE with Multi-tenant cluster

FAST PACE

LAST 30 DAYS

41 issues created and 38 resolved. Core team + Community

Page 11: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HISTORY

HUE 1

Desktop-like in a browser, did its job but pretty slow, memory leaks and not very IE friendly but definitely advanced for its time (2009-2010).

Page 12: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HISTORY

HUE 2

The first flat structure port, with Twitter Bootstrap all over the place.

Page 13: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HISTORY

HUE 2.5

New apps, improved the UX adding new nice functionalities like autocomplete and drag & drop.

Page 14: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HISTORY

HUE 3 ALPHA

Proposed design, didn’t make it.

Page 15: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HISTORY

HUE 3.5+

Where we are now, new UI, several new apps, the most user friendly features to date.

Page 16: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHICH VERSION TO USE?

6 months 1k commits later1-2 years old

HUE 2.X HUE 3.X HUE 3.5 + 1/2 3.6

Page 17: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHICH DISTRIBUTION?

Advanced preview The most stable and cross component checked

Very latest

GITHUB CDH / CM TARBALL

HACKER ADVANCED USER NORMAL USER

Page 18: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHERE TO PUT HUE? IN ONE MACHINE

Page 19: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHERE TO PUT HUE? INSIDE THE CLUSTER

Page 20: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHERE TO PUT HUE? OUTSIDE THE CLUSTER

Page 21: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHAT DO YOU NEED?

Python 2.4 2.6 That’s it if using a packaged version. If building from the source, here are the extra packages

SERVER CLIENT

Web Browser IE 9+, FF 10+, Chrome, Safari

Page 22: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW DOES THE HUE SERVICE LOOK LIKE?

Process serving pages and also static content

1 SERVER 1 DB

For cookies, saved queries, workflows, …

Page 23: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW TO CONFIGURE HUE

HUE.INI

Similar to core-site.xml but with .INI syntax !

Where?

/etc/hue/conf/hue.ini

or

$HUE_HOME/desktop/conf/

pseudo-distributed.ini

[desktop] [[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, or sqlite3 engine=sqlite3 ## host= ## port= ## user= ## password= name=desktop/desktop.db

Page 24: April 2014 HUG : Integrating HUE with Multi-tenant cluster

AUTHENTICATE / LOGIN

[desktop] [[auth]] # - django.contrib.auth.backends.ModelBackend (entirely Django backend) # - desktop.auth.backend.AllowAllBackend (allows everyone) # - desktop.auth.backend.AllowFirstUserDjangoBackend # - desktop.auth.backend.LdapBackend # - desktop.auth.backend.OAuthBackend # ... ## backend=desktop.auth.backend.AllowFirstUserDjangoBackend

Page 25: April 2014 HUG : Integrating HUE with Multi-tenant cluster

USERS

Can give and revoke permissions to single users or group of users

ADMIN USER

Regular user + permissions

Page 26: April 2014 HUG : Integrating HUE with Multi-tenant cluster

DB BACKEND

Page 27: April 2014 HUG : Integrating HUE with Multi-tenant cluster

LDAP BACKEND

Integrate your employees: LDAP How to guide

Page 28: April 2014 HUG : Integrating HUE with Multi-tenant cluster

LIST OF GROUPS AND PERMISSIONS

A permission can: - allow access to one app

(e.g. Hive Editor) - modify data from the app

(e.g drop Hive Tables or edit cells in HBase Browser)

CONFIGURE APPSAND PERMISSIONS

A list of permissions

Page 29: April 2014 HUG : Integrating HUE with Multi-tenant cluster

PERMISSIONS IN ACTION

User ‘test’ belonging to the group ‘hiveonly’ that has just the ‘hive’ permissions

CONFIGURE APPSAND PERMISSIONS

Page 30: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW HUE INTERACTSWITH HADOOP

YARN

JobTracker

Oozie

Hue Plugins

LDAP SAML

Pig

HDFS HiveServer2

Hive Metastore

Cloudera Impala

Solr

HBase

Sqoop2

Zookeeper

Page 31: April 2014 HUG : Integrating HUE with Multi-tenant cluster

RCP CALLS TO ALLTHE HADOOP COMPONENTS

HDFS EXAMPLE

WebHDFS REST

DN

DN

DN

DN

NN

http://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS

Page 32: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW

Host/port of all services like Oozie, Yarn, HDFS, HBase… APIs are specified in hue.ini on sections, e.g. [hbase] by major service, Hue core [desktop] or Hue lib [liboozie]

[hbase] # Comma-separated list of HBase Thrift servers for # clusters in the format of '(name|host:port)'. hbase_clusters=(Cluster|localhost:9090) !

[liboozie] # The URL where the Oozie service runs on. # oozie_url=http://hue.ent.cloudera.com:11000/oozie

RCP CALLS TO ALLTHE HADOOP COMPONENTS

Full list

Page 33: April 2014 HUG : Integrating HUE with Multi-tenant cluster

KERBEROS

1 Hue ticket/ principal - no user ticket !

Hue uses its ticket for authenticating to every other service (HDFS, Oozie, …)

read more on the Hue Security Guide

Page 34: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HUE KERBEROS TICKET

kadmin: addprinc -randkey hue/[email protected]

Add Hue user principal to Kerberos

$ kinit -k -t /etc/hue/hue.keytab hue/[email protected]

Test

Ticket should be renewable (krb5.conf and kdc.conf)

[desktop] [[kerberos]] # Path to Hue's Kerberos keytab file hue_keytab=/etc/hue/hue.keytab # Kerberos principal name for Hue hue_principal=hue/FQDN@REALM # add kinit path for non root users kinit_path=/usr/kerberos/bin/kinit

hue.ini

Page 35: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW

Hue is a “super proxy” Client could be on a Windows machine, phone… and interact with all the Hadoop services

http://localhost:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hue&doas=bob

IMPERSONATION

<!-- Hue WebHDFS proxy user setting --><property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property>

Call for getting the information about an HDFS file

WebHDFS, add to core-site.xml

Page 36: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HTTPS SSL DB SSL WITH HIVESERVER2

READ MORE … AUDITING

OTHER SECURITYFEATURES

Page 37: April 2014 HUG : Integrating HUE with Multi-tenant cluster

2 Hue instances HA proxy Multi DB Performances: like a website, mostly RPC calls

HIGH AVAILABILITY

HOW

Page 38: April 2014 HUG : Integrating HUE with Multi-tenant cluster

DEMO TIME

Page 40: April 2014 HUG : Integrating HUE with Multi-tenant cluster

CONFIGURATIONS ARE HARD…

…GIVE CLOUDERA MANAGER A TRY!

vimeo.com/91805055

Page 41: April 2014 HUG : Integrating HUE with Multi-tenant cluster

MISSEDSOMETHING?

learn.gethue.com

Page 42: April 2014 HUG : Integrating HUE with Multi-tenant cluster

LINKS

TWITTER

@gethue

USER GROUP

hue-user@

WEBSITE

http://gethue.com

LEARN

http://learn.gethue.com

Page 43: April 2014 HUG : Integrating HUE with Multi-tenant cluster

GET HUE

Try in advance the latest and greatest but you’ll have to configure everything on your own.

Get to play with Hue and various Hadoop components in 5 minutes. It’s a self contained CDH environment ready to use.

Newer version than HDP, close to the original 2.5 minus apps like HBase, Impala, Sqoop, Search.

The newest addition, ships Hue 3.0 through the GreenButton products.

Stable and highly tested releases perfectly integrated with the Hadoop ecosystem, automagically configured by Cloudera Manager.

In HDP there’s an old forked version of Hue 2.3.

CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM

HORTONWORKS* MAPR* HP CLOUD*

* YOUR MILEAGE MAY VARY.

BIGTOP EMBEDDED/DEMO IN IND. COMPANIES

Page 44: April 2014 HUG : Integrating HUE with Multi-tenant cluster

WHAT ARE YOUR USE CASES?

WHICH COMPONENTS DO YOU USE?

WHAT WOULD YOU LIKE TO SEE IN HUE?

INTERESTED IN CONTRIBUTING? WANNA SAY HELLO? DO YOU WANT A TAILOR

MADE TEAM RETREAT?

QUESTIONS?

TEAM@ GETHUE.COM

Page 45: April 2014 HUG : Integrating HUE with Multi-tenant cluster

THANK YOU!

gethue.com

Page 46: April 2014 HUG : Integrating HUE with Multi-tenant cluster

APPENDIX

Page 47: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW

Add Hue as WebHDFS proxy user setting like 3 slides ago Add the property on the right in hdfs-site.xml to enable WebHDFS in the NameNode and DataNodes

<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>

HDFS FILE BROWSER

[hadoop] [[hdfs_clusters]] # HA support by using HttpFs ! [[[default]]] # Enter the filesystem uri ##fs_defaultfs=hdfs://localhost:8020 ! # Use WebHdfs/HttpFs as the communication mechanism. ##webhdfs_url=http://localhost:50070/webhdfs/v1

hdfs-site.xml

hue.ini

Page 48: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW

Example of config for having Hue interact with Yarn

[hadoop] [[yarn_clusters]] ! [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=localhost ! # The port where the ResourceManager IPC listens on ## resourcemanager_port=8032 ! # Whether to submit jobs to this cluster submit_to=True ! # Change this if your YARN cluster is Kerberos-secured ## security_enabled=false ! # URL of the ResourceManager API ## resourcemanager_api_url=http://localhost:8088 ! # URL of the ProxyServer API ## proxy_api_url=http://localhost:8088 ! # URL of the HistoryServer API # history_server_api_url=http://localhost:19888 ! [[[ha]]] # Enter the host on which you are running the failover Resource Manager resourcemanager_api_url=http://localhost:8088 ## logical_name= submit_to=True

YARN / MR2

Page 49: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW

Based on HiveServer2 interface !

Note for Hive: <property> <name>hive.server2.enable.doAs</name> <value>true</value> </property> !

Video demoSetup tutorial

[beeswax] # Host where Hive server Thrift daemon is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). ## hive_server_host=localhost ## hive_server_port=10000 ! # Hive configuration directory, where hive-site.xml is located ## hive_conf_dir=/etc/hive/conf

HIVE (IMPALA / SHARK)

Page 50: April 2014 HUG : Integrating HUE with Multi-tenant cluster

HOW

Make sure share lib is installed !

Alternative Dashboard and Editors

[liboozie] #oozie_url=http://localhost.com:11000/oozie

OOZIE

HOW

Comes with Oozie, no PigServer yet Oozie sharelib Oozie credentials for security

PIG